We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime where the sample size n and the number of features p are both increasing such that $p/n \to \gamma\in (0,1)$. An estimator of the out-of-sample error of a robust M-estimator is analysed and proved to be consistent for a large family of loss functions that includes the Huber loss. As an application of this result, we propose an adaptive tuning procedure of the scale parameter $\lambda>0$ of a given loss function $\rho$: choosing$\hat \lambda$ in a given interval $I$ that minimizes the out-of-sample error estimate of the M-estimator constructed with loss $\rho_\lambda(\cdot) = \lambda^2 \rho(\cdot/\lambda)$ leads to the optimal out-of-sample error over $I$. The proof relies on a smoothing argument: the unregularized M-estimation objective function is perturbed, or smoothed, with a Ridge penalty that vanishes as $n\to+\infty$, and show that the unregularized M-estimator of interest inherits properties of its smoothed version.
翻译:本文考虑线性模型在高斯设计且噪声具有重尾分布情形下的无正则化稳健M估计,在样本量n与特征数量p同步增长且满足$p/n \to \gamma\in (0,1)$的比例渐近框架下展开分析。我们系统研究并证明了一种针对稳健M估计的样本外误差估计量的一致性——该估计适用于包含Huber损失在内的广泛损失函数族。基于此结果,我们提出给定损失函数$\rho$中尺度参数$\lambda>0$的自适应调优方案:选取区间$I$内的$\hat \lambda$以最小化由损失函数$\rho_\lambda(\cdot) = \lambda^2 \rho(\cdot/\lambda)$构造的M估计的样本外误差估计量,从而实现在$I$上的最优样本外误差。证明过程采用平滑化策略:对无正则化M估计目标函数施加随$n\to+\infty$趋于零的Ridge惩罚扰动(即平滑化处理),并证明目标无正则化M估计继承其平滑化版本的相关性质。