Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stability of scalable sparse approximations based on inducing points. To do so, we first review numerical stability, and illustrate typical situations in which Gaussian process models can be unstable. Building on stability theory originally developed in the interpolation literature, we derive sufficient and in certain cases necessary conditions on the inducing points for the computations performed to be numerically stable. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions. This is done via a modification of the cover tree data structure, which is of independent interest. We additionally propose an alternative sparse approximation for regression with a Gaussian likelihood which trades off a small amount of performance to further improve stability. We provide illustrative examples showing the relationship between stability of calculations and predictive performance of inducing point methods on spatial tasks.
翻译:高斯过程常作为更大规模机器学习和决策系统的一部分部署,例如在地理空间建模、贝叶斯优化或潜在高斯模型中。在系统中,高斯过程模型需要以稳定可靠的方式运行,以确保其与系统其他部分正确交互。本文研究了基于诱导点的可扩展稀疏近似的数值稳定性。为此,我们首先回顾了数值稳定性,并说明了高斯过程模型可能不稳定的典型情况。基于插值文献中发展的稳定性理论,我们推导了诱导点需满足的充分条件,在某些情况下为必要条件,以确保所执行的计算数值稳定。对于低维任务(如地理空间建模),我们提出了一种自动化方法,用于计算满足这些条件的诱导点。这是通过对覆盖树数据结构的修改实现的,该修改本身具有独立意义。此外,我们提出了一种用于高斯似然回归的替代稀疏近似方法,该方法牺牲少量性能以进一步提升稳定性。我们通过示例展示了空间任务中计算稳定性与诱导点方法预测性能之间的关系。