ProSpar-GP: scalable Gaussian process modeling with massive non-stationary datasets

Gaussian processes (GPs) are a popular class of Bayesian nonparametric models, but its training can be computationally burdensome for massive training datasets. While there has been notable work on scaling up these models for big data, existing methods typically rely on a stationary GP assumption for approximation, and can thus perform poorly when the underlying response surface is non-stationary, i.e., it has some regions of rapid change and other regions with little change. Such non-stationarity is, however, ubiquitous in real-world problems, including our motivating application for surrogate modeling of computer experiments. We thus propose a new Product of Sparse GP (ProSpar-GP) method for scalable GP modeling with massive non-stationary data. The ProSpar-GP makes use of a carefully-constructed product-of-experts formulation of sparse GP experts, where different experts are placed within local regions of non-stationarity. These GP experts are fit via a novel variational inference approach, which capitalizes on mini-batching and GPU acceleration for efficient optimization of inducing points and length-scale parameters for each expert. We further show that the ProSpar-GP is Kolmogorov-consistent, in that its generative distribution defines a valid stochastic process over the prediction space; such a property provides essential stability for variational inference, particularly in the presence of non-stationarity. We then demonstrate the improved performance of the ProSpar-GP over the state-of-the-art, in a suite of numerical experiments and an application for surrogate modeling of a satellite drag simulator.

翻译：高斯过程（GP）是一类流行的贝叶斯非参数模型，但其训练过程在大规模数据集上计算负担沉重。尽管已有显著工作致力于提升其在大数据场景下的扩展性，现有方法通常依赖平稳GP假设进行近似，因此当底层响应曲面具有非平稳性（即某些区域快速变化而其他区域变化微弱）时，其性能会显著下降。然而，这种非平稳性在实际问题中普遍存在，包括我们计算机实验代理模型的激励性应用。为此，我们提出一种新型稀疏GP乘积（ProSpar-GP）方法，用于大规模非平稳数据的可扩展GP建模。ProSpar-GP通过精心构建的稀疏GP专家乘积公式，将不同专家部署在非平稳性的局部区域中。这些GP专家通过一种新颖的变分推断方法进行拟合，该方法利用小批量处理和GPU加速对每个专家的诱导点与长度尺度参数进行高效优化。我们进一步证明ProSpar-GP具有Kolmogorov一致性，即其生成分布在预测空间上定义了一个有效的随机过程；该特性为变分推断提供了关键稳定性，尤其是在非平稳性存在时。通过一系列数值实验及卫星拖曳模拟器代理模型的应用，我们验证了ProSpar-GP相较于现有先进方法的性能提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日