Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.
翻译:高斯过程(GPs)常用于空间数据分析的预测与推断。然而,由于估计与预测任务在计算时间上具有位置点数量的三次方复杂度,在内存上具有二次方复杂度,高斯过程难以扩展到大型空间数据集。Vecchia近似法在依赖结构中引入了稀疏性,是提出的几种用于扩展高斯过程推断的方法之一。我们的工作通过为高斯过程开发一个随机梯度马尔可夫链蒙特卡洛(SGMCMC)框架以实现高效计算,为该领域的实质性研究增添了新内容。该算法在每一步对位置点子集进行子采样,随后通过Vecchia近似的高斯过程似然更新过程参数。由于Vecchia近似的高斯过程在计算时间上具有位置点数量的线性复杂度,这实现了高斯过程的可扩展估计。通过模拟研究,我们证明SGMCMC在计算时间和参数估计方面与最先进的可扩展高斯过程算法具有竞争力。我们还使用Argo海洋温度测量数据集提供了该方法的一个应用实例。