Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length scale of the kernel and has to be carefully selected to obtain a model with good generalization. The default methods for bandwidth selection, cross-validation and marginal likelihood maximization, often yield good results, albeit at high computational costs. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by kernel ridge regression with the Gaussian kernel depend on the kernel bandwidth. We use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic, based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation and marginal likelihood maximization, our method is on pair in terms of model performance, but up to six orders of magnitude faster.
翻译:大多数机器学习方法需要调整超参数。对于采用高斯核的核岭回归,超参数即为带宽。带宽决定了核函数的长度尺度,必须精心选择才能获得具有良好泛化能力的模型。默认的带宽选择方法——交叉验证和边际似然最大化——通常能取得良好结果,但代价是高昂的计算成本。受雅可比正则化的启发,我们构建了一个近似表达式,阐明了高斯核岭回归推断函数的导数如何依赖于核带宽。利用这一表达式,我们提出了一种基于雅可比控制的闭式、计算极轻的带宽选择启发式方法。此外,该雅可比表达式还揭示了带宽选择本质上是推断函数平滑性与训练数据核矩阵条件数之间的权衡。在真实和合成数据上的实验表明,与交叉验证和边际似然最大化相比,我们的方法在模型性能上表现相当,但速度提升可达六个数量级。