Spatio-temporal Bayesian inference drives environmental and health sciences using latent Gaussian models. Integrated Nested Laplace Approximations (INLA) enable inference for these models at HPC scale but rely on derivative-based optimization over $d$ hyperparameters. State-of-the-art INLA implementations approximate derivatives via central finite differences (FD), requiring $2d{+}1$ evaluations. These evaluations are embarrassingly parallel, but total work and energy grow with $d$, limiting time-to-solution under fixed budgets. Reverse-mode automatic differentiation (AD) computes exact gradients independently of $d$, but its efficient application to INLA's structured-sparse kernels is an open challenge. We present ADELIA, the first AD-enabled INLA implementation with a structure-exploiting multi-GPU backward pass leveraging model sparsity. We evaluate ADELIA on ten benchmark models, including real-world air-pollution monitoring. We achieve $4.2$--$7.9\times$ per-gradient speedups and reliable convergence on production-scale models with up to 1.9M latent variables, where FD struggles. Even when scaled to 16--32 GPUs to match ADELIA's wall-clock time, FD consumes $5$--$8\times$ more energy.
翻译:时空贝叶斯推断通过潜在高斯模型驱动环境与健康科学研究。集成嵌套拉普拉斯近似(INLA)方法能在HPC规模下实现此类模型的推断,但其依赖于对$d$个超参数进行基于导数的优化。现有最先进的INLA实现通过中心有限差分(FD)近似导数,需进行$2d{+}1$次评估。这些评估虽属易并行计算,但总计算量与能耗随$d$增长,在固定预算下限制了求解时间。反向模式自动微分(AD)可独立于$d$计算精确梯度,但如何将其高效应用于INLA的结构化稀疏核仍是一个开放挑战。本文提出ADELIA——首个支持自动微分的INLA实现,其具备利用模型稀疏性的结构化多GPU反向传播机制。我们在十个基准模型(包含实际空气污染监测场景)上进行评估。在含有高达190万潜在变量的生产级模型上,ADELIA实现每梯度$4.2$至$7.9$倍的加速比与可靠收敛性,而FD方法在此场景下表现困难。即使将FD扩展至16-32个GPU以匹配ADELIA的壁钟时间,其能耗仍高出$5$至$8$倍。