Cellwise outliers are widespread in data and traditional robust methods may fail when applied to datasets under such contamination. We propose a variable selection procedure, that uses a pairwise robust estimator to obtain an initial empirical covariance matrix among the response and potentially many predictors. Then we replace the primary design matrix and the response vector with their robust counterparts based on the estimated covariance matrix. Finally, we adopt the adaptive Lasso to obtain variable selection results. The proposed approach is robust to cellwise outliers in regular and high dimensional settings and empirical results show good performance in comparison with recently proposed alternative robust approaches, particularly in the challenging setting when contamination rates are high but the magnitude of outliers is moderate. Real data applications demonstrate the practical utility of the proposed method.
翻译:元格离群值在数据中广泛存在,传统稳健方法在应用于此类污染的数据集时可能失效。本文提出一种变量选择方法:首先使用成对稳健估计量获得响应变量与多个潜在预测变量之间的初始经验协方差矩阵,随后基于该协方差矩阵将原始设计矩阵和响应向量替换为其稳健对应量,最后采用自适应Lasso得到变量选择结果。所提方法在常规和高维设置下对元格离群值具有稳健性,实证结果表明,与近期提出的替代稳健方法相比,该方法表现出良好性能,特别是在污染率高但离群值幅度适中的挑战性场景中。实际数据分析展示了该方法的实用价值。