Cellwise contamination remains a challenging problem for data scientists, particularly in research fields that require the selection of sparse features. Traditional robust methods may not be feasible nor efficient in dealing with such contaminated datasets. We propose CR-Lasso, a robust Lasso-type cellwise regularization procedure that performs feature selection in the presence of cellwise outliers by minimising a regression loss and cell deviation measure simultaneously. To evaluate the approach, we conduct empirical studies comparing its selection and prediction performance with several sparse regression methods. We show that CR-Lasso is competitive under the settings considered. We illustrate the effectiveness of the proposed method on real data through an analysis of a bone mineral density dataset.
翻译:摘要:逐元素污染仍是数据科学家面临的挑战性问题,尤其在需要稀疏特征选择的研究领域中。传统稳健方法在处理此类受污染数据集时可能缺乏可行性或效率。我们提出CR-Lasso,一种稳健的Lasso型逐元素正则化过程,通过同时最小化回归损失和元素偏差度量,在存在逐元素异常值时执行特征选择。为评估该方法,我们开展实证研究,将其选择与预测性能与多种稀疏回归方法进行比较。结果显示,CR-Lasso在设定条件下具有竞争力。通过对骨密度数据集的真实数据分析,我们验证了所提方法的有效性。