Gaussian Processes (GPs) are vital for modeling and predicting irregularly-spaced, large geospatial datasets. However, their computations often pose significant challenges in large-scale applications. One popular method to approximate GPs is the Vecchia approximation, which approximates the full likelihood via a series of conditional probabilities. The classical Vecchia approximation uses univariate conditional distributions, which leads to redundant evaluations and memory burdens. To address this challenge, our study introduces block Vecchia, which evaluates each multivariate conditional distribution of a block of observations, with blocks formed using the K-means algorithm. The proposed GPU framework for the block Vecchia uses varying batched linear algebra operations to compute multivariate conditional distributions concurrently, notably diminishing the frequent likelihood evaluations. Diving into the factor affecting the accuracy of the block Vecchia, the neighbor selection criterion is investigated, where we found that the random ordering markedly enhances the approximated quality as the block count becomes large. To verify the scalability and efficiency of the algorithm, we conduct a series of numerical studies and simulations, demonstrating their practical utility and effectiveness compared to the exact GP. Moreover, we tackle large-scale real datasets using the block Vecchia method, i.e., high-resolution 3D profile wind speed with a million points.
翻译:高斯过程(GPs)对于建模和预测不规则分布的大规模地理空间数据集至关重要。然而,其计算过程在大规模应用中常带来显著挑战。Vecchia近似是一种广泛采用的高斯过程近似方法,它通过一系列条件概率来逼近完整似然函数。经典Vecchia近似采用单变量条件分布,这会导致冗余计算和内存负担。为应对这一挑战,本研究提出块状Vecchia方法,该方法通过K均值算法构建观测数据块,并对每个数据块的多变量条件分布进行评估。所提出的块状Vecchia GPU框架采用可变的批量线性代数运算并行计算多变量条件分布,显著减少了频繁的似然函数评估。针对影响块状Vecchia精度的因素,本研究深入探讨了邻域选择准则,发现当数据块数量增大时,随机排序能显著提升近似质量。为验证算法的可扩展性与效率,我们开展了一系列数值研究与仿真实验,证明其相较于精确高斯过程具有实际效用与优越性能。此外,我们运用块状Vecchia方法处理了大规模真实数据集——包含百万个数据点的高分辨率三维剖面风速数据。