Scalable non-separable spatio-temporal Gaussian process models for large-scale short-term weather prediction

Monitoring daily weather fields is critical for climate science, agriculture, and environmental planning, yet fully probabilistic spatio-temporal models become computationally prohibitive at continental scale. We present a case study on short-term forecasting of daily maximum temperature and precipitation across the conterminous United States using novel scalable spatio-temporal Gaussian process methodology. Building on three approximation families - inducing-point methods (FITC), Vecchia approximations, and a hybrid Vecchia-inducing-point full-scale approach (VIF) - we introduce three extensions that address key bottlenecks in large space-time settings: (i) a scalable correlation-based neighbor selection strategy for Vecchia approximations with point-referenced data, enabling accurate conditioning under complex dependence structures, (ii) a space-time kMeans++ inducing-point selection algorithm, and (iii) GPU-accelerated implementations of computationally expensive operations, including matrix operations and neighbor searches. Using both synthetic experiments and a large NOAA station dataset containing more than one million space-time observations, we analyze the models with respect to predictive performance, parameter estimation, and computational efficiency. Our results demonstrate that scalable Gaussian process models can yield accurate continental-scale forecasts while remaining computationally feasible, offering practical tools for weather applications.

翻译：监测每日天气场对于气候科学、农业和环境规划至关重要，然而完全概率性的时空模型在大陆尺度上计算成本过高。本文通过新型可扩展时空高斯过程方法，针对美国本土的日最高气温和降水短期预报进行了案例研究。基于三种近似方法族——诱导点方法（FITC）、Vecchia近似法以及混合Vecchia-诱导点全尺度方法（VIF）——我们提出了三项扩展技术以解决大规模时空场景中的关键瓶颈：（i）针对点参考数据Vecchia近似的可扩展基于相关性的邻域选择策略，能够在复杂依赖结构下实现精确条件化；（ii）时空kMeans++诱导点选择算法；（iii）对计算密集型操作（包括矩阵运算和邻域搜索）的GPU加速实现。通过合成实验和包含超百万时空观测数据的NOAA大型站点数据集，我们从预测性能、参数估计和计算效率三个维度对模型进行了分析。结果表明，可扩展高斯过程模型能够在保持计算可行性的同时，提供准确的大陆尺度天气预报，为气象应用提供了实用工具。

相关内容

高斯过程

关注 6

高斯过程（Gaussian Process, GP）是概率论和数理统计中随机过程（stochastic process）的一种，是一系列服从正态分布的随机变量（random variable）在一指数集（index set）内的组合。高斯过程中任意随机变量的线性组合都服从正态分布，每个有限维分布都是联合正态分布，且其本身在连续指数集上的概率密度函数即是所有随机变量的高斯测度，因此被视为联合正态分布的无限维广义延伸。高斯过程由其数学期望和协方差函数完全决定，并继承了正态分布的诸多性质

《用于水文建模应用的美国空军全球空陆天气开发模型数据流程：GALWEM采集系统v1.0与v2.0概述》最新报告

专知会员服务

18+阅读 · 2025年12月27日

【HKUST博士论文】迈向可扩展且具泛化能力的时空预测

专知会员服务

18+阅读 · 2025年6月27日

【剑桥博士论文】利用深度学习方法预测与建模空间天气

专知会员服务

15+阅读 · 2025年5月2日

水下通信《通信感知、可扩展高斯过程在分布式探索中的应用》186页

专知会员服务

21+阅读 · 2025年4月30日