We compare two different linear dimensionality reduction strategies for the multigroup classification problem: the trace ratio method and Fisher's discriminant analysis. Recently, trace ratio optimization has gained in popularity due to its computational efficiency, as well as the occasionally better classification results. However, a statistical understanding is still incomplete. We study and compare the properties of the two methods. Then, we propose a robust version of the trace ratio method, to handle the presence of outliers in the data. We reinterpret an asymptotic perturbation bound for the solution to the trace ratio, in a contamination setting. Finally, we compare the performance of the trace ratio method and Fisher's discriminant analysis on both synthetic and real datasets, using classical and robust estimators.
翻译:针对多组分类问题,我们比较了两种不同的线性降维策略:迹比方法与Fisher判别分析。近年来,迹比优化因计算效率以及偶尔更优的分类结果而广受关注,但其统计学理解仍不完善。我们研究并比较了这两种方法的性质,随后提出了一种鲁棒性迹比方法以处理数据中异常值的存在。在污染背景下,我们重新解释了迹比解的一个渐近扰动界。最后,我们使用经典和鲁棒估计量,在合成数据集与真实数据集上比较了迹比方法与Fisher判别分析的性能。