A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator

We consider learning an unknown target function $f_*$ using kernel ridge regression (KRR) given i.i.d. data $(u_i,y_i)$, $i\leq n$, where $u_i \in U$ is a covariate vector and $y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$. A recent string of work has empirically shown that the test error of KRR can be well approximated by a closed-form estimate derived from an `equivalent' sequence model that only depends on the spectrum of the kernel operator. However, a theoretical justification for this equivalence has so far relied either on restrictive assumptions -- such as subgaussian independent eigenfunctions -- , or asymptotic derivations for specific kernels in high dimensions. In this paper, we prove that this equivalence holds for a general class of problems satisfying some spectral and concentration properties on the kernel eigendecomposition. Specifically, we establish in this setting a non-asymptotic deterministic approximation for the test error of KRR -- with explicit non-asymptotic bounds -- that only depends on the eigenvalues and the target function alignment to the eigenvectors of the kernel. Our proofs rely on a careful derivation of deterministic equivalents for random matrix functionals in the dimension free regime pioneered by Cheng and Montanari (2022). We apply this setting to several classical examples and show an excellent agreement between theoretical predictions and numerical simulations. These results rely on having access to the eigendecomposition of the kernel operator. Alternatively, we prove that, under this same setting, the generalized cross-validation (GCV) estimator concentrates on the test error uniformly over a range of ridge regularization parameter that includes zero (the interpolating solution). As a consequence, the GCV estimator can be used to estimate from data the test error and optimal regularization parameter for KRR.

翻译：我们考虑利用核岭回归（KRR）学习未知目标函数$f_*$，给定独立同分布数据$(u_i,y_i)$，$i\leq n$，其中$u_i \in U$为协变量向量，$y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$。近期一系列实证研究表明，KRR的测试误差可通过一个仅依赖于核算子谱的"等价"序列模型导出的闭式估计得到良好近似。然而，这种等价性的理论证明此前要么依赖限制性假设（如次高斯独立本征函数），要么依赖高维情形下特定核函数的渐近推导。本文证明，对于满足核特征分解中某些谱性质和浓度性质的一类广义问题，该等价性成立。具体而言，我们在该设定下建立了KRR测试误差的非渐近确定性近似——具有显式非渐近界——该近似仅依赖于特征值及目标函数与核特征向量的对齐性。我们的证明依赖于对Cheng和Montanari（2022年）开创的无量纲自由随机矩阵泛函中确定性等价的精细推导。我们将该设定应用于若干经典案例，并展示理论预测与数值模拟的高度吻合。这些结果需依赖于核算子特征分解的获取。作为替代，我们证明在同一设定下，广义交叉验证（GCV）估计量在包含零值（插值解）的岭正则化参数范围内一致地集中于测试误差。因此，GCV估计量可用于从数据中估计KRR的测试误差与最优正则化参数。