We study robust high-dimensional sparse regression under finite-variance heavy-tailed noise, epsilon-contamination, and alpha-mixing dependence via two subsampling estimators: Adaptive Importance Sampling (AIS) and Stratified Sub-sampling (SS). Under sub-Gaussian design whose scopeis precisely delimited and finite-variance noise, a subsample of size m achieves the minimax-optimal rate. We close the theory-algorithm gap: Theorem 4.6 applies to AIS at termination conditional on stabilized weights (Proposition 4.1), and SS fits the median-of-means M-estimation framework of Lecue and Lerasle (Proposition 4.3). The de-biasing step is fully specified via the nodewise-Lasso precision estimator under a new sparse-precision assumption, yielding valid coordinate-wise CIs (Theorem 4.14). The alpha-mixing extension uses a calendar-time block protocol that guarantees temporal separation (Theorem 4.12). Empirically, AIS achieves 3.10 times lower error than uniform subsampling at 20% contamination, and 29.5% lower test MSE on Riboflavin (p=4,088 and n=71).
翻译:本文通过两种子抽样估计器——自适应重要性抽样(AIS)与分层子抽样(SS),研究有限方差重尾噪声、ε-污染及α-混合依赖条件下的高维稀疏回归问题。在严格界定的亚高斯设计及有限方差噪声假设下,规模为m的子样本可达到极小极大最优收敛速率。我们弥合了理论与算法间的鸿沟:定理4.6适用于基于稳定化权重(命题4.1)终止的AIS方法,而SS符合Lecue与Lerasle提出的中位数均值M估计框架(命题4.3)。通过基于新型稀疏精度假设的节点Lasso精度估计器,我们完整指定了去偏步骤,从而得到有效的坐标置信区间(定理4.14)。α-混合扩展采用日历时间分块协议确保时间分离性(定理4.12)。实证表明:在20%污染率下,AIS比均匀子抽样的误差降低3.10倍;在核黄素数据集(p=4,088,n=71)上的测试均方误差降低29.5%。