We study high-dimensional sparse regression under simultaneous heavy-tailed covariates and noise. Heavy-tailed data affect sparse optimization in two different ways: extreme covariates can destabilize the gradient field during global localization, while heavy-tailed noise limits the final statistical accuracy during local refinement. Motivated by this two-phase structure, we propose two-stage RIGHT, a robust sparse first-order method based on coordinate-wise median-of-means (MoM) gradient estimation and delayed sample splitting. The MoM gradient estimator is computationally simple, compatible with hard-thresholded updates, and admits phase-adaptive concentration bounds whose rates depend on the current localization radius. Delayed splitting reuses data during global localization and reserves fresh batches for the shorter refinement stage, reducing the sample-splitting cost. The theoretical results reveal a decoupled rate structure: the design-tail index controls gradient stability and sample complexity, whereas the noise-tail index controls the final statistical rate. We also provide phase-wise lower-bound benchmarks showing that the design-driven localization barrier is intrinsic. Extensive simulation experiments and real data analysis showcase the efficacy of the proposed method over existing competitors.
翻译:本文研究在高维重尾协变量与噪声条件下的稀疏回归问题。重尾数据对稀疏优化存在双重影响:极端协变量在全局定位阶段可能破坏梯度场的稳定性,而重尾噪声在局部精化阶段则限制最终统计精度。基于这一两阶段结构,我们提出两阶段RIGHT方法——一种基于坐标-wise中位数均值(MoM)梯度估计与延迟样本分裂的稳健稀疏一阶算法。MoM梯度估计器计算简便,兼容硬阈值更新,且其浓度界具有阶段自适应特性:收敛速率依赖于当前定位半径。延迟分裂策略在全局定位阶段重用数据,并为较短的局部精化阶段保留新样本批次,从而降低样本分裂成本。理论结果揭示了解耦的速率结构:设计尾指数控制梯度稳定性与样本复杂度,而噪声尾指数决定最终统计速率。我们同时提供阶段式下界基准,证明设计驱动的定位屏障具有内在必然性。大量仿真实验与真实数据分析表明,所提方法优于现有竞争算法。