Most machine learning methods require tuning of hyper-parameters. For kernel ridge regression (KRR) with the Gaussian kernel, the hyper-parameter is the bandwidth. The bandwidth specifies the length-scale of the kernel and has to be carefully selected in order to obtain a model with good generalization. The default method for bandwidth selection is cross-validation which often yields good results, albeit at high computational costs. Furthermore, the estimates provided by cross-validation tend to have very high variance, especially when training data are scarce. Inspired by Jacobian regularization, we formulate an approximate expression for how the derivatives of the functions inferred by KRR with the Gaussian kernel depend on the kernel bandwidth. We then use this expression to propose a closed-form, computationally feather-light, bandwidth selection heuristic based on controlling the Jacobian. In addition, the Jacobian expression illuminates how the bandwidth selection is a trade-off between the smoothness of the inferred function, and the conditioning of the training data kernel matrix. We show on real and synthetic data that compared to cross-validation, our method is considerably more stable in terms of bandwidth selection, and, for small data sets, provides better predictions.
翻译:大多数机器学习方法都需要调整超参数。对于使用高斯核的核岭回归(KRR),超参数即为带宽。带宽决定了核函数的长度尺度,必须仔细选择才能获得具有良好泛化能力的模型。带宽选择的默认方法是交叉验证,该方法通常能取得不错的结果,但计算成本较高。此外,交叉验证提供的估计往往具有非常高的方差,尤其是在训练数据稀少的情况下。受雅可比正则化的启发,我们推导出一个近似表达式,用于描述高斯核KRR推断函数的导数如何依赖于核带宽。随后,我们利用这一表达式提出了一种基于控制雅可比矩阵的闭式、计算极其轻量的带宽选择启发式方法。此外,雅可比表达式揭示了带宽选择是如何在推断函数的光滑性与训练数据核矩阵的条件数之间进行权衡。我们在真实数据和合成数据上表明,与交叉验证相比,我们的方法在带宽选择方面稳定性显著提升,并且对于小数据集能提供更优的预测。