核随机矩阵与核回归在二次渐近机制下的普适性 (Universality of Kernel Random Matrices and Kernel Regression in the Quadratic Regime)

Kernel ridge regression (KRR) is a popular class of machine learning models that has become an important tool for understanding deep learning. Much of the focus thus far has been on studying the proportional asymptotic regime, $n \asymp d$, where $n$ is the number of training samples and $d$ is the dimension of the dataset. In the proportional regime, under certain conditions on the data distribution, the kernel random matrix involved in KRR exhibits behavior akin to that of a linear kernel. In this work, we extend the study of kernel regression to the quadratic asymptotic regime, where $n \asymp d^2$. In this regime, we demonstrate that a broad class of inner-product kernels exhibits behavior similar to a quadratic kernel. Specifically, we establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix with additional correction terms compared to the Taylor expansion of the kernel functions. The approximation works for general data distributions under a Gaussian-moment-matching assumption with a covariance structure. This new approximation is utilized to obtain a limiting spectral distribution of the original kernel matrix and characterize the precise asymptotic training and test errors for KRR in the quadratic regime when $n/d^2$ converges to a non-zero constant. The generalization errors are obtained for (i) a random teacher model, (ii) a deterministic teacher model where the weights are perfectly aligned with the covariance of the data. Under the random teacher model setting, we also verify that the generalized cross-validation (GCV) estimator can consistently estimate the generalization error in the quadratic regime for anisotropic data. Our proof techniques combine moment methods, Wick's formula, orthogonal polynomials, and resolvent analysis of random matrices with correlated entries.

翻译：核岭回归（KRR）是一类流行的机器学习模型，已成为理解深度学习的重要工具。迄今为止的研究焦点主要集中在比例渐近机制 $n \asymp d$ 上，其中 $n$ 是训练样本数，$d$ 是数据集维度。在比例机制下，当数据分布满足特定条件时，KRR 所涉及的核随机矩阵表现出类似于线性核的行为。本研究将核回归的分析拓展至二次渐近机制，即 $n \asymp d^2$。在此机制下，我们证明一大类内积核表现出与二次核相似的行为。具体而言，我们建立了原始核随机矩阵与二次核随机矩阵之间差异的算子范数逼近界，其中二次核随机矩阵相较于核函数的泰勒展开增加了修正项。该逼近适用于满足具有协方差结构的高斯矩匹配假设的通用数据分布。我们利用这一新的逼近结果，获得了原始核矩阵的极限谱分布，并在 $n/d^2$ 收敛于非零常数的二次机制下，精确刻画了 KRR 的渐近训练误差与测试误差。泛化误差的推导针对两种场景：（i）随机教师模型；（ii）确定性教师模型，其中权重与数据的协方差完全对齐。在随机教师模型设定下，我们还验证了广义交叉验证（GCV）估计量能够一致地估计各向异性数据在二次机制下的泛化误差。我们的证明技术结合了矩方法、维克公式、正交多项式，以及具有相关项随机矩阵的预解式分析。