Gaussian processes (GPs) are a popular class of Bayesian nonparametric models, but its training can be computationally burdensome for massive training datasets. While there has been notable work on scaling up these models for big data, existing methods typically rely on a stationary GP assumption for approximation, and can thus perform poorly when the underlying response surface is non-stationary, i.e., it has some regions of rapid change and other regions with little change. Such non-stationarity is, however, ubiquitous in real-world problems, including our motivating application for surrogate modeling of computer experiments. We thus propose a new Product of Sparse GP (ProSpar-GP) method for scalable GP modeling with massive non-stationary data. The ProSpar-GP makes use of a carefully-constructed product-of-experts formulation of sparse GP experts, where different experts are placed within local regions of non-stationarity. These GP experts are fit via a novel variational inference approach, which capitalizes on mini-batching and GPU acceleration for efficient optimization of inducing points and length-scale parameters for each expert. We further show that the ProSpar-GP is Kolmogorov-consistent, in that its generative distribution defines a valid stochastic process over the prediction space; such a property provides essential stability for variational inference, particularly in the presence of non-stationarity. We then demonstrate the improved performance of the ProSpar-GP over the state-of-the-art, in a suite of numerical experiments and an application for surrogate modeling of a satellite drag simulator.
翻译:高斯过程(GP)是一类流行的贝叶斯非参数模型,但其训练过程在大规模数据集上计算负担沉重。尽管已有显著工作致力于提升其在大数据场景下的扩展性,现有方法通常依赖平稳GP假设进行近似,因此当底层响应曲面具有非平稳性(即某些区域快速变化而其他区域变化微弱)时,其性能会显著下降。然而,这种非平稳性在实际问题中普遍存在,包括我们计算机实验代理模型的激励性应用。为此,我们提出一种新型稀疏GP乘积(ProSpar-GP)方法,用于大规模非平稳数据的可扩展GP建模。ProSpar-GP通过精心构建的稀疏GP专家乘积公式,将不同专家部署在非平稳性的局部区域中。这些GP专家通过一种新颖的变分推断方法进行拟合,该方法利用小批量处理和GPU加速对每个专家的诱导点与长度尺度参数进行高效优化。我们进一步证明ProSpar-GP具有Kolmogorov一致性,即其生成分布在预测空间上定义了一个有效的随机过程;该特性为变分推断提供了关键稳定性,尤其是在非平稳性存在时。通过一系列数值实验及卫星拖曳模拟器代理模型的应用,我们验证了ProSpar-GP相较于现有先进方法的性能提升。