The distribution of Ridgeless least squares interpolators

The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years in both machine learning and statistics communities. While it seems to defy conventional wisdom that overfitting leads to poor prediction, recent theoretical research on its $\ell_2$-type risks reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This paper takes a further step that aims at understanding its precise stochastic behavior as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which provides a precise quantification of the prescribed implicit regularization in the most general distributional sense. Our distributional characterizations hold for general non-Gaussian random designs and extend uniformly to positively regularized Ridge estimators. As a direct application, we obtain a complete characterization for a general class of weighted $\ell_q$ risks of the Ridge(less) estimators that are previously only known for $q=2$ by random matrix methods. These weighted $\ell_q$ risks not only include the standard prediction and estimation errors, but also include the non-standard covariate shift settings. Our uniform characterizations further reveal a surprising feature of the commonly used generalized and $k$-fold cross-validation schemes: tuning the estimated $\ell_2$ prediction risk by these methods alone lead to simultaneous optimal $\ell_2$ in-sample, prediction and estimation risks, as well as the optimal length of debiased confidence intervals.

翻译：在过参数化线性回归中，无脊最小$\ell_2$范数插值器近年来在机器学习和统计学界引起了广泛关注。尽管它似乎违背了过拟合会导致预测性能下降的传统观念，但最近关于其$\ell_2$类风险的理论研究表明，其范数最小化特性诱导了一种“隐式正则化”，有助于在插值的同时提升预测性能。本文进一步深入，旨在理解其作为统计估计量的精确随机行为。具体而言，我们在高维条件下刻画了无脊插值器的分布，将其表述为一个具有正正则化项的关联高斯序列模型中的岭估计量，从而在最一般的分布意义上为所描述的隐式正则化提供了精确的量化。我们的分布刻画适用于一般的非高斯随机设计，并可一致地推广至正正则化的岭估计量。作为直接应用，我们获得了Ridge(less)估计量的一类广义加权$\ell_q$风险的完整刻画，而此前仅通过随机矩阵方法已知$q=2$的情形。这些加权$\ell_q$风险不仅包括标准的预测误差和估计误差，还涵盖了非标准的协变量偏移设置。我们的一致性刻画进一步揭示了常用的广义交叉验证和$k$折交叉验证方案的一个惊人特性：仅通过这些方法调整估计的$\ell_2$预测风险，即可同时实现最优的$\ell_2$样本内风险、预测风险和估计风险，以及最优的去偏置信区间长度。