We consider the problem of simultaneous variable selection and estimation of the corresponding regression coefficients in an ultra-high dimensional linear regression models, an extremely important problem in the recent era. The adaptive penalty functions are used in this regard to achieve the oracle variable selection property along with easier computational burden. However, the usual adaptive procedures (e.g., adaptive LASSO) based on the squared error loss function is extremely non-robust in the presence of data contamination which are quite common with large-scale data (e.g., noisy gene expression data, spectra and spectral data). In this paper, we present a regularization procedure for the ultra-high dimensional data using a robust loss function based on the popular density power divergence (DPD) measure along with the adaptive LASSO penalty. We theoretically study the robustness and the large-sample properties of the proposed adaptive robust estimators for a general class of error distributions; in particular, we show that the proposed adaptive DPD-LASSO estimator is highly robust, satisfies the oracle variable selection property, and the corresponding estimators of the regression coefficients are consistent and asymptotically normal under easily verifiable set of assumptions. Numerical illustrations are provided for the mostly used normal error density. Finally, the proposal is applied to analyze an interesting spectral dataset, in the field of chemometrics, regarding the electron-probe X-ray microanalysis (EPXMA) of archaeological glass vessels from the 16th and 17th centuries.
翻译:我们考虑超高维线性回归模型中同时进行变量选择及相应回归系数估计的问题,这是当代研究中极其重要的课题。自适应惩罚函数被用于此目的,以实现oracle变量选择性质并降低计算负担。然而,基于平方误差损失函数的常规自适应方法(例如自适应LASSO)在存在数据污染时极度缺乏稳健性,而这种污染在大规模数据(例如含噪声的基因表达数据、光谱及谱数据)中相当普遍。本文提出一种用于超高维数据的正则化方法,该方法采用基于流行密度幂散度(DPD)度量的稳健损失函数,并结合自适应LASSO惩罚。我们在理论上研究了所提出的自适应稳健估计量对于一类通用误差分布的稳健性和大样本性质;具体而言,我们证明了所提出的自适应DPD-LASSO估计量高度稳健、满足oracle变量选择性质,且在易于验证的假设条件下,相应的回归系数估计量具有相合性和渐近正态性。针对最常用的正态误差密度给出了数值示例。最后,本文将所提方法应用于分析化学计量学领域中一个有趣的光谱数据集,该数据集涉及16世纪和17世纪考古玻璃器皿的电子探针X射线微区分析(EPXMA)。