The distribution of Ridgeless least squares interpolators

The Ridgeless minimum $\ell_2$-norm interpolator in overparametrized linear regression has attracted considerable attention in recent years. While it seems to defy the conventional wisdom that overfitting leads to poor prediction, recent research reveals that its norm minimizing property induces an `implicit regularization' that helps prediction in spite of interpolation. This renders the Ridgeless interpolator a theoretically tractable proxy that offers useful insights into the mechanisms of modern machine learning methods. This paper takes a different perspective that aims at understanding the precise stochastic behavior of the Ridgeless interpolator as a statistical estimator. Specifically, we characterize the distribution of the Ridgeless interpolator in high dimensions, in terms of a Ridge estimator in an associated Gaussian sequence model with positive regularization, which plays the role of the prescribed implicit regularization in the context of prediction risk. Our distributional characterizations hold for general random designs and extend uniformly to positively regularized Ridge estimators. As a demonstration of the analytic power of these characterizations, we derive approximate formulae for a general class of weighted $\ell_q$ risks for Ridge(less) estimators that were previously available only for $\ell_2$. Our theory also provides certain further conceptual reconciliation with the conventional wisdom: given any data covariance, a certain amount of regularization in Ridge regression remains beneficial for `most' signals across various statistical tasks including prediction, estimation and inference, as long as the noise level is non-trivial. Surprisingly, optimal tuning can be achieved simultaneously for all the designated statistical tasks by a single generalized or $k$-fold cross-validation scheme, despite being designed specifically for tuning prediction risk.

翻译：在过参数化线性回归中，Ridgeless最小$\ell_2$范数插值器近年来引起了广泛关注。尽管它似乎违背了过拟合导致预测性能差的传统认知，但最新研究表明，其范数最小化特性产生了一种“隐式正则化”，即使在插值情形下也有助于预测。这使得Ridgeless插值器成为理论上可处理的代理模型，为理解现代机器学习方法的机制提供了有益见解。本文从不同视角出发，旨在理解Ridgeless插值器作为统计估计量的精确随机行为。具体而言，我们刻画了高维情形下Ridgeless插值器的分布特性，将其表示为具有正正则化的关联高斯序列模型中Ridge估计量——该正则化在预测风险背景下扮演了预设隐式正则化的角色。我们的分布刻画适用于一般随机设计，并能一致推广至正正则化Ridge估计量。作为这些刻画分析能力的例证，我们推导了Ridge(less)估计量一类加权$\ell_q$风险的近似公式——此前类似结果仅对$\ell_2$风险成立。我们的理论还提供了与传统认知的进一步概念调和：给定任意数据协方差，只要噪声水平非平凡，Ridge回归中一定量的正则化在包括预测、估计和推断在内的各类统计任务中对“大多数”信号保持有益。令人惊讶的是，尽管单次广义或$k$折交叉验证方案专为调节预测风险设计，它却能同时实现所有指定统计任务的最优调参。