Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters. The solution can be obtained either as a closed-form solution, which includes a matrix inversion, or iteratively through gradient descent. Using the iterative approach opens up for changing the kernel during training, something that is investigated in this paper. We theoretically address the effects this has on model complexity and generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior, phenomena that do not occur for KRR with constant bandwidth but are known to appear for neural networks.
翻译:核岭回归(KRR)是线性岭回归的推广,其核心在于数据具有非线性而参数保持线性。该问题的求解可通过包含矩阵求逆的闭式解,或通过梯度下降的迭代方法实现。采用迭代方法允许在训练过程中动态调整核函数——这是本文的研究重点。我们从理论上探讨了这种动态调整对模型复杂度与泛化能力的影响。基于理论分析,我们提出针对平移不变核函数的带宽更新策略:通过使带宽在训练过程中递减至零,从而规避超参数选择问题。在真实与合成数据上的实验表明,训练过程中递减带宽的方案优于通过交叉验证或边际似然最大化选取恒定带宽的传统方法。理论分析与实证结果均证实:采用递减带宽策略可同时实现零训练误差与优异泛化性能,并观察到“双重下降”现象——这种在恒定带宽KRR中不会出现、但常见于神经网络的特殊行为。