GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations

Gaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.

翻译：高斯过程（GPs）常用于地理空间分析，但在处理海量数据时面临高计算复杂度的挑战。例如，估计地理空间数据统计模型参数所需的对数似然函数是一个计算密集型过程，涉及对大小为n×n的协方差矩阵求逆（其中n表示地理位置数量）。因此，文献中研究逐渐转向近似方法，以便在保持高精度的同时有效处理更大的n值。这些方法涵盖低秩近似和稀疏近似等多种技术。Vecchia近似是加速对数似然函数评估最具前景的方法之一。本研究提出一种基于当代GPU批处理矩阵计算的Vecchia近似并行实现方案。该实现利用批处理线性代数例程高效执行Vecchia算法中的单个条件分布。我们采用KBLAS线性代数库执行批处理线性代数运算，相比ExaGeoStat软件中现有最优的似然估计并行实现，在32GB GV100、80GB A100和80GB H100 GPU上分别实现了高达700倍、833倍和1380倍的求解时间缩短。我们在单个NVIDIA GPU上成功处理了更大规模问题，在80GB A100和H100 GPU上可容纳多达100万个位置点，同时保持必要的应用精度。进一步通过两个真实地理空间数据集（密西西比河流域土壤湿度数据和中东地区风速数据）评估算法精度性能，确定了在保持精度前提下Vecchia近似算法的最优配置。