Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stability of scalable sparse approximations based on inducing points. To do so, we first review numerical stability, and illustrate typical situations in which Gaussian process models can be unstable. Building on stability theory originally developed in the interpolation literature, we derive sufficient and in certain cases necessary conditions on the inducing points for the computations performed to be numerically stable. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions. This is done via a modification of the cover tree data structure, which is of independent interest. We additionally propose an alternative sparse approximation for regression with a Gaussian likelihood which trades off a small amount of performance to further improve stability. We provide illustrative examples showing the relationship between stability of calculations and predictive performance of inducing point methods on spatial tasks.
翻译:高斯过程常被部署为更大规模机器学习与决策系统的一部分,例如地理空间建模、贝叶斯优化或隐高斯模型中。在系统内部,高斯过程模型需要以稳定可靠的方式运行,以确保其与系统其他部分正确交互。本文研究基于诱导点的可扩展稀疏近似的数值稳定性。为此,我们首先回顾数值稳定性概念,并举例说明高斯过程模型可能不稳定的典型场景。基于从插值文献中发展的稳定性理论,我们推导了计算数值稳定所需诱导点满足的充分条件,并在特定情况下给出必要条件。对于低维任务(如地理空间建模),我们提出一种自动计算满足这些条件诱导点的方法。该方法通过修改覆盖树数据结构实现,该修改本身具有独立研究价值。此外,针对高斯似然的回归问题,我们提出一种替代稀疏近似方法,通过牺牲少量性能以进一步提升稳定性。我们通过示例阐释空间任务中诱导点方法的计算稳定性与预测性能之间的关系。