As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gradient descent is highly effective. To that end, we introduce a particularly simple \emph{stochastic dual descent} algorithm, explain its design in an intuitive manner and illustrate the design choices through a series of ablation studies. Further experiments demonstrate that our new method is highly competitive. In particular, our evaluations on the UCI regression tasks and on Bayesian optimisation set our approach apart from preconditioned conjugate gradients and variational Gaussian process approximations. Moreover, our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
翻译:众所周知,在高斯过程回归中,从后验分布采样以及计算后验均值都需要求解大型线性方程组。我们研究使用随机梯度下降求解该线性系统,并证明当采用《正确方法》时——即整合优化与核方法领域的特定洞见——随机梯度下降具有极高有效性。为此,我们提出一种极为简洁的《随机对偶下降》算法,通过直观方式阐释其设计原理,并借助系列消融实验说明设计选择。后续实验表明,我们的新方法具有极强竞争力。具体而言,在UCI回归任务和贝叶斯优化上的评估中,我们的方法显著优于预条件共轭梯度法和变分高斯过程近似。此外,在分子结合亲和力预测任务上,本方法使高斯过程回归达到与最先进图神经网络相当的水平。