Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far away. But nowadays there is increasing interest in cellwise outliers, that are unexpected entries in the data matrix. Removing an entire case because it has one or a few outlying cells would lose much information. Cellwise robust methods aim to detect the outlying cells and to preserve the information in the other cells. We propose a DA method that is trained by estimating the location and covariance of each class by cellwise and casewise robust estimators, that can also handle NA's. The main novelty of our approach is in the prediction on test data, that may contain outlying cells and NA's themselves. The new robust discriminant function is derived from a novel statistical model by penalized maximum likelihood. We focus on quadratic DA, but also cover the setting of linear DA. The new cellQDA and cellLDA methods perform well in simulation. The approach is illustrated on real data, and the results are interpreted with the help of graphical displays.
翻译:经典判别分析(DA)基于每个类别的均值和经验协方差矩阵,两者均对数据中的异常值敏感。过去的研究主要关注案例异常值,即远离整体的数据点。但近年来,人们对单元格异常值的兴趣日益增加,即数据矩阵中出现意外条目。由于一个或几个异常单元格而删除整个案例会丢失大量信息。单元格稳健方法旨在检测异常单元格并保留其他单元格中的信息。我们提出一种判别分析方法,通过使用单元格稳健和案例稳健的估计器来估计每个类别的位置和协方差,该方法还能处理缺失值(NA)。我们方法的主要创新点在于对测试数据的预测,这些测试数据本身可能包含异常单元格和缺失值。新的稳健判别函数通过惩罚最大似然法从新的统计模型中推导得出。我们重点研究二次判别分析,但也涵盖了线性判别分析的设置。新提出的cellQDA和cellLDA方法在模拟中表现良好。该方法在实际数据上进行了验证,并通过图形展示对结果进行了解释。