Gaussian processes are frequently deployed as part of larger machine learning and decision-making systems, for instance in geospatial modeling, Bayesian optimization, or in latent Gaussian models. Within a system, the Gaussian process model needs to perform in a stable and reliable manner to ensure it interacts correctly with other parts of the system. In this work, we study the numerical stability of scalable sparse approximations based on inducing points. To do so, we first review numerical stability, and illustrate typical situations in which Gaussian process models can be unstable. Building on stability theory originally developed in the interpolation literature, we derive sufficient and in certain cases necessary conditions on the inducing points for the computations performed to be numerically stable. For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions. This is done via a modification of the cover tree data structure, which is of independent interest. We additionally propose an alternative sparse approximation for regression with a Gaussian likelihood which trades off a small amount of performance to further improve stability. We provide illustrative examples showing the relationship between stability of calculations and predictive performance of inducing point methods on spatial tasks.
翻译:高斯过程常作为更大规模机器学习和决策系统的一部分部署,例如用于地理空间建模、贝叶斯优化或潜在高斯模型中。在系统中,高斯过程模型需要以稳定可靠的方式运行,以确保其与系统其他部分正确交互。本文研究了基于诱导点的可扩展稀疏近似的数值稳定性。为此,我们首先回顾数值稳定性概念,并说明高斯过程模型可能出现不稳定的典型场景。基于插值文献中发展的稳定性理论,我们推导了诱导点需满足的充分条件(特定情况下为必要条件),以保证计算过程的数值稳定性。针对低维任务(如地理空间建模),我们提出了一种自动计算满足这些条件的诱导点的方法。该方法通过对覆盖树数据结构的修改实现,这一修改本身也具有独立研究价值。此外,针对具有高斯似然的回归问题,我们提出了一种替代性稀疏近似方法,该方法通过牺牲少量性能来进一步提升稳定性。最后通过示例说明,在空间任务中,计算稳定性与基于诱导点方法的预测性能之间的关联。