Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.
翻译:高斯过程(GP)常用于空间数据分析和预测推断。但由于估计与预测任务的计算复杂度随位置数量呈三次方时间增长和二次方内存消耗,标准GP难以处理大规模空间数据集。Vecchia近似通过在依赖结构中引入稀疏性,成为实现GP推断扩展的经典方法之一。本研究在该领域已有成果基础上,开发了基于随机梯度马尔可夫链蒙特卡洛(SGMCMC)的GP高效计算框架。该算法每步对位置进行小批量子采样,并通过Vecchia近似GP似然更新过程参数。由于Vecchia近似GP的时间复杂度与位置数量呈线性关系,该方法实现了可扩展的GP估计。仿真研究表明,SGMCMC在计算时间和参数估计方面与当前最先进的可扩展GP算法性能相当。本文还利用Argo海洋温度测量数据集展示了该方法的应用。