Gaussian process (GP) regression is a Bayesian nonparametric method for regression and interpolation, offering a principled way of quantifying the uncertainties of predicted function values. For the quantified uncertainties to be well-calibrated, however, the kernel of the GP prior has to be carefully selected. In this paper, we theoretically compare two methods for choosing the kernel in GP regression: cross-validation and maximum likelihood estimation. Focusing on the scale-parameter estimation of a Brownian motion kernel in the noiseless setting, we prove that cross-validation can yield asymptotically well-calibrated credible intervals for a broader class of ground-truth functions than maximum likelihood estimation, suggesting an advantage of the former over the latter. Finally, motivated by the findings, we propose interior cross validation, a procedure that adapts to an even broader class of ground-truth functions.
翻译:高斯过程(GP)回归是一种用于回归与插值的贝叶斯非参数方法,能够以原理性方式量化预测函数值的不确定性。然而,要使量化不确定性具备良好校准性,必须谨慎选择GP先验的核函数。本文从理论上比较了GP回归中两种核选择方法:交叉验证与最大似然估计。聚焦于无噪声环境下布朗运动核的尺度参数估计,我们证明了相较于最大似然估计,交叉验证能对更广泛类别的真实函数生成渐近校准的置信区间,表明前者优于后者。最后,基于这一发现,我们提出了内部交叉验证方法——一种能适应更广泛真实函数类别的算法。