Scaling hyperparameter optimisation to very large datasets remains an open problem in the Gaussian process community. This paper focuses on iterative methods, which use linear system solvers, like conjugate gradients, alternating projections or stochastic gradient descent, to construct an estimate of the marginal likelihood gradient. We discuss three key improvements which are applicable across solvers: (i) a pathwise gradient estimator, which reduces the required number of solver iterations and amortises the computational cost of making predictions, (ii) warm starting linear system solvers with the solution from the previous step, which leads to faster solver convergence at the cost of negligible bias, (iii) early stopping linear system solvers after a limited computational budget, which synergises with warm starting, allowing solver progress to accumulate over multiple marginal likelihood steps. These techniques provide speed-ups of up to $72\times$ when solving to tolerance, and decrease the average residual norm by up to $7\times$ when stopping early.
翻译:将超参数优化扩展到超大规模数据集仍然是高斯过程领域的一个开放性问题。本文聚焦于迭代方法,这类方法使用线性系统求解器(如共轭梯度法、交替投影法或随机梯度下降法)来构建边缘似然梯度的估计。我们讨论了适用于各类求解器的三项关键改进:(i) 路径梯度估计器,它减少了所需的求解器迭代次数,并分摊了进行预测的计算成本;(ii) 使用前一步的解对线性系统求解器进行热启动,这以可忽略的偏差为代价,带来了更快的求解器收敛速度;(iii) 在有限计算预算后对线性系统求解器进行早停,这与热启动协同作用,使得求解器的进展能够在多个边缘似然步骤中累积。这些技术在求解至指定容差时提供了高达 $72\times$ 的加速,并在早停时将平均残差范数降低了高达 $7\times$。