Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-variance optimization objectives for sampling from the posterior and extend these to inducing points. Counterintuitively, stochastic gradient descent often produces accurate predictions, even in cases where it does not converge quickly to the optimum. We explain this through a spectral characterization of the implicit bias from non-convergence. We show that stochastic gradient descent produces predictive distributions close to the true posterior both in regions with sufficient data coverage, and in regions sufficiently far away from the data. Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. Its uncertainty estimates match the performance of significantly more expensive baselines on a large-scale Bayesian~optimization~task.
翻译:高斯过程是量化不确定性和序贯决策的强大框架,但受限于求解线性系统的需求。一般而言,这会产生与数据集大小呈三次方的计算成本,并且对条件数敏感。我们探索将随机梯度算法作为近似求解这些线性系统的计算高效方法:我们开发了用于从后验中采样的低方差优化目标,并将其扩展至诱导点。反直观的是,即使在某些情况下随机梯度下降未快速收敛到最优解,它仍能生成准确的预测。我们通过非收敛隐式偏差的谱特征解释这一现象。研究表明,随机梯度下降在数据覆盖充分的区域以及距离数据足够远的区域均能产生接近真实后验的预测分布。实验表明,随机梯度下降在充分大规模或病态的回归任务上达到了最先进的性能。其不确定性估计在一项大规模贝叶斯优化任务中与显著更昂贵的基准方法性能相当。