Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present several iterative methods for inference with Vecchia-Laplace approximations which make computations considerably faster compared to Cholesky-based calculations. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based inference and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages.
翻译:潜在高斯过程模型是灵活的概率非参数函数模型。Vecchia近似是高斯过程的高效近似方法,用于克服大数据集的计算瓶颈;而Laplace近似是一种快速方法,具有渐近收敛保证,可近似非高斯似然的边际似然和后验预测分布。然而,当将Vecchia-Laplace近似与Cholesky分解等直接求解方法结合使用时,其计算复杂度随样本量呈超线性增长。因此,恰好在近似通常最准确的大数据集上,基于Vecchia-Laplace近似的计算变得异常缓慢。本文提出了几种用于Vecchia-Laplace近似推断的迭代方法,与基于Cholesky的计算相比,这些方法显著提升了计算速度。我们从理论上对所提方法进行了分析,并在模拟数据和真实数据实验中进行了验证。特别地,与基于Cholesky的推断相比,我们实现了量级级别的加速;在大规模卫星数据集上与当前最优方法相比,预测精度(以连续排序概率评分衡量)提升了三倍。所有方法均已实现在一个免费的C++软件库中,并提供了高级Python和R语言包。