Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length-scale of the kernel and has to be carefully selected in order to obtain a model with good generalization. The default methods for bandwidth selection is cross-validation and marginal likelihood maximization, which often yields good results, albeit at high computational costs. Furthermore, the estimates provided by these methods tend to have very high variance, especially when training data are scarce. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We then use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function, and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is considerably faster and considerably more stable in terms of bandwidth selection.
翻译:大多数机器学习方法都需要调整超参数。对于使用高斯核的核岭回归,其超参数为带宽。带宽决定了核的长度尺度,必须精心选择才能获得具有良好泛化能力的模型。默认的带宽选择方法包括交叉验证和边际似然最大化,这些方法通常能获得不错的结果,但计算成本较高。此外,这些方法提供的估计往往具有很高的方差,尤其是在训练数据稀缺时。受雅可比正则化的启发,我们推导出一个近似表达式,用于描述由高斯核岭回归推断的函数导数如何依赖于核带宽。随后,我们利用该表达式提出一种基于控制雅可比矩阵的闭式、计算极轻的带宽选择启发式方法。此外,该雅可比表达式揭示了带宽选择如何权衡推断函数的平滑性与训练数据核矩阵的条件数。我们在真实与合成数据上证明,与交叉验证和边际似然最大化相比,我们的方法在带宽选择方面速度显著更快,且稳定性大幅提升。