High-dimensional data can often display heterogeneity due to heteroscedastic variance or inhomogeneous covariate effects. Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data. The former is computationally challenging due to the non-smooth nature of the check loss, and the latter is sensitive to heavy-tailed error distributions. In this paper, we propose and study (penalized) robust expectile regression (retire), with a focus on iteratively reweighted $\ell_1$-penalization which reduces the estimation bias from $\ell_1$-penalization and leads to oracle properties. Theoretically, we establish the statistical properties of the retire estimator under two regimes: (i) low-dimensional regime in which $d \ll n$; (ii) high-dimensional regime in which $s\ll n\ll d$ with $s$ denoting the number of significant predictors. In the high-dimensional setting, we carefully characterize the solution path of the iteratively reweighted $\ell_1$-penalized retire estimation, adapted from the local linear approximation algorithm for folded-concave regularization. Under a mild minimum signal strength condition, we show that after as many as $\log(\log d)$ iterations the final iterate enjoys the oracle convergence rate. At each iteration, the weighted $\ell_1$-penalized convex program can be efficiently solved by a semismooth Newton coordinate descent algorithm. Numerical studies demonstrate the competitive performance of the proposed procedure compared with either non-robust or quantile regression based alternatives.
翻译:高维数据常因异方差或协变量效应不均匀而呈现异质性。惩罚分位数回归与期望分位数回归方法为检测高维数据中的异方差性提供了有效工具。前者因检验损失函数的非光滑性而面临计算挑战,后者则对重尾误差分布敏感。本文提出并研究了(惩罚型)稳健期望分位数回归(retire),重点采用迭代加权$\ell_1$正则化方法,该方法能减少$\ell_1$正则化带来的估计偏差并具备Oracle性质。理论上,我们建立了retire估计量在两种情景下的统计性质:(i) 低维情景($d \ll n$);(ii) 高维情景($s\ll n\ll d$,其中$s$表示显著预测变量数量)。在高维设定中,我们精细刻画了迭代加权$\ell_1$惩罚retire估计的求解路径,该方法基于凹折叠正则化的局部线性近似算法。在温和的最小信号强度条件下,我们证明经过$\log(\log d)$次迭代后,最终迭代解即可达到Oracle收敛速度。每次迭代中的加权$\ell_1$惩罚凸规划问题可通过半光滑牛顿坐标下降算法高效求解。数值实验表明,与基于非稳健回归或分位数回归的替代方法相比,本文提出的方法具有竞争优势。