Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by existing casewise robust regression methods. It is another matter when cellwise outliers occur, that is, suspicious individual entries in the data matrix containing the regressors and the response. We propose a new regression method that is robust to both casewise and cellwise outliers, and handles missing values as well. Its construction allows for skewed distributions. We show that it obeys the first breakdown result for cellwise robust regression. It is also the first such method that is geared to making robust out-of-sample predictions. Its performance is studied by simulation, and it is illustrated on a substantial real dataset.
翻译:回归分析是统计学的核心方法,常面临包含异常值的实际数据。当这些异常值为个案异常值,即完全错误或属于不同总体的观测样本时,现有基于个案稳健的回归方法可解决此问题。然而当出现单元异常值,即包含回归变量与响应变量的数据矩阵中出现可疑的独立条目时,情况则截然不同。本文提出一种对个案与单元异常值均具有稳健性且能处理缺失值的新型回归方法。其构造允许偏态分布存在。我们证明该方法满足单元稳健回归的首个崩溃点理论结果,同时也是首个专注于实现稳健样本外预测的此类方法。通过模拟研究验证了其性能,并在大规模实际数据集上进行了实证展示。