The asymptotically precise estimation of the generalization of kernel methods has recently received attention due to the parallels between neural networks and their associated kernels. However, prior works derive such estimates for training by kernel ridge regression (KRR), whereas neural networks are typically trained with gradient descent (GD). In the present work, we consider the training of kernels with a family of $\textit{spectral algorithms}$ specified by profile $h(\lambda)$, and including KRR and GD as special cases. Then, we derive the generalization error as a functional of learning profile $h(\lambda)$ for two data models: high-dimensional Gaussian and low-dimensional translation-invariant model. Under power-law assumptions on the spectrum of the kernel and target, we use our framework to (i) give full loss asymptotics for both noisy and noiseless observations (ii) show that the loss localizes on certain spectral scales, giving a new perspective on the KRR saturation phenomenon (iii) conjecture, and demonstrate for the considered data models, the universality of the loss w.r.t. non-spectral details of the problem, but only in case of noisy observation.
翻译:核方法泛化性能的渐近精确估计近期因神经网络与其关联核之间的相似性而受到关注。然而,先前的工作仅针对核岭回归(KRR)训练推导出此类估计,而神经网络通常通过梯度下降(GD)进行训练。本文考虑通过由谱函数$h(\lambda)$指定的一族$\textit{谱算法}$(包括KRR和GD作为特例)训练核的方法。随后,我们针对两种数据模型推导了作为学习谱函数$h(\lambda)$泛函的泛化误差:高维高斯模型和低维平移不变模型。在关于核谱与目标谱的幂律假设下,我们利用该框架:(i) 给出了有噪声和无噪声观测下完整损失函数的渐近形式;(ii) 证明了损失函数局域化于特定谱尺度,为KRR饱和现象提供了新视角;(iii) 针对所考虑的数据模型,推测并验证了损失函数对问题非谱细节的普适性(仅针对有噪声观测情形)。