Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters. The solution can be obtained either as a closed-form solution, which includes solving a system of linear equations, or iteratively through gradient descent. Using the iterative approach opens up for changing the kernel during training, something that is investigated in this paper. We theoretically address the effects this has on model complexity and generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth but are known to appear for neural networks.
翻译:核岭回归(Kernel Ridge Regression, KRR)是线性岭回归的一种推广形式,它在数据上呈现非线性,但在参数上保持线性。其解可通过闭式解(涉及求解线性方程组)或通过梯度下降迭代获得。采用迭代方法为训练过程中改变核函数提供了可能,这正是本文研究的核心内容。我们从理论上分析了这种变化对模型复杂度与泛化性能的影响。基于研究发现,我们提出了一种平移不变核带宽的更新方案:在训练过程中让带宽逐渐衰减至零,从而避免超参数选择的需要。通过在真实数据与合成数据上的实验,我们证明了训练过程中衰减带宽的策略优于使用通过交叉验证与边缘似然最大化选择的固定带宽。我们还从理论与实验两方面表明,采用衰减带宽能够同时实现零训练误差与良好的泛化性能,并出现双下降现象——这些现象在使用固定带宽的KRR中不会出现,但在神经网络中已知存在。