The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $\ell^p$ regularizers where $p \ge 2$. Subsequently, we extend this to the LASSO interpolator with $\ell^1$ regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.
翻译:关于偏差-方差权衡与模型规模关系的经验法则在经典机器学习中具有关键作用,但众所周知,在过参数化场景下该法则会因双下降曲线而失效。特别地,最小范数插值估计器可能表现优异,这表明在这些场景中需要新的权衡框架。为此,我们针对带$\ell^p$惩罚项的过参数化线性回归提出了正则化-锐度权衡理论。受插值信息准则启发,我们的框架将选择惩罚项分解为:正则化项(量化正则化器与插值器的对齐程度)和插值流形上的几何锐度项(量化局部扰动的影响),从而构建出类偏差-方差的权衡关系。基于先前针对岭正则化器建立该信息准则的研究,本工作首先给出了$p \ge 2$时$\ell^p$正则化器插值信息准则的通用表达式。随后,我们将该框架扩展至采用$\ell^1$正则化器的LASSO插值器(其能诱导更强的稀疏性)。在随机傅里叶特征和多项式特征的真实数据集上的实证结果验证了我们的理论,展示了如何通过权衡项区分高性能线性插值器与弱性能插值器。