It is by now well-established that modern over-parameterized models seem to elude the bias-variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to analyze this phenomenon in the relatively tractable setting of kernel regression. However, as we argue in detail, most past works on this topic either make unrealistic assumptions, or focus on a narrow problem setup. This work aims to provide a unified theory to upper bound the excess risk of kernel regression for nearly all common and realistic settings. Specifically, we provide rigorous bounds that hold for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples. Furthermore, we provide relative perturbation bounds for the eigenvalues of kernel matrices, which may be of independent interest. These reveal a self-regularization phenomenon, whereby a heavy tail in the eigendecomposition of the kernel provides it with an implicit form of regularization, enabling good generalization. When applied to common kernels, our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression. As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
翻译:如今,现代过参数化模型似乎规避了偏差-方差权衡,并且在过拟合噪声的情况下仍能良好泛化,这一观点已被广泛认可。许多近期研究尝试在相对易于处理的核回归框架中分析这一现象。然而,正如我们详细论证的那样,过去关于该主题的大多数研究要么基于不现实的假设,要么聚焦于狭窄的问题设定。本研究旨在提供一个统一理论,以在几乎所有常见且现实的设定下给出核回归超额风险的上界。具体而言,我们提供了严格的界限,这些界限适用于常见核函数、任意正则化强度、任意噪声水平、任意输入维度以及任意样本数量。此外,我们给出了核矩阵特征值的相对扰动界,这些结果可能具有独立的研究价值。这些结果揭示了一种自正则化现象:核函数的特征分解中的重尾特征为其提供了一种隐式正则化形式,从而使得良好的泛化成为可能。当应用于常见核函数时,我们的结果意味着高输入维度下的良性过拟合、固定维度下的近温和过拟合,以及正则化回归的显式收敛速率。作为副产品,我们获得了在核机制下训练的神经网络的依赖于时间的界限。