We study the optimisation problem associated with Gaussian process regression using squared loss. The most common approach to this problem is to apply an exact solver, such as conjugate gradient descent, either directly, or to a reduced-order version of the problem. Recently, driven by successes in deep learning, stochastic gradient descent has gained traction as an alternative. In this paper, we show that when done right$\unicode{x2014}$by which we mean using specific insights from the optimisation and kernel communities$\unicode{x2014}$this approach is highly effective. We thus introduce a particular stochastic dual gradient descent algorithm, that may be implemented with a few lines of code using any deep learning framework. We explain our design decisions by illustrating their advantage against alternatives with ablation studies and show that the new method is highly competitive. Our evaluations on standard regression benchmarks and a Bayesian optimisation task set our approach apart from preconditioned conjugate gradients, variational Gaussian process approximations, and a previous version of stochastic gradient descent for Gaussian processes. On a molecular binding affinity prediction task, our method places Gaussian process regression on par in terms of performance with state-of-the-art graph neural networks.
翻译:我们研究了使用平方损失的高斯过程回归所涉及的优化问题。针对该问题,最常用的方法是直接应用精确求解器(如共轭梯度下降法),或将其应用于问题的降阶版本。近年来,受深度学习领域成功的驱动,随机梯度下降法作为一种替代方案逐渐受到关注。本文证明,当采用正确的方式——即结合优化与核方法领域的特定见解时——这种方法非常有效。因此,我们提出了一种特定的随机对偶梯度下降算法,该算法可通过任何深度学习框架用少量代码实现。我们通过消融研究展示了设计决策相对于其他替代方案的优势,并证明新方法具有极强的竞争力。在标准回归基准和贝叶斯优化任务上的评估表明,我们的方法有别于预处理共轭梯度法、变分高斯过程近似以及先前版本的用于高斯过程的随机梯度下降法。在分子结合亲和度预测任务中,我们的方法使高斯过程回归的性能与最先进的图神经网络相媲美。