Spatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in many scientific and business-intelligence applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As modern datasets continue to increase in size and complexity, there is a growing need for new statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle large prediction problems. This work presents the Bayesian Neural Field (BayesNF), a domain-general statistical model for inferring rich probability distributions over a spatiotemporal domain, which can be used for data-analysis tasks including forecasting, interpolation, and variography. BayesNF integrates a novel deep neural network architecture for high-capacity function estimation with hierarchical Bayesian inference for robust uncertainty quantification. By defining the prior through a sequence of smooth differentiable transforms, posterior inference is conducted on large-scale data using variationally learned surrogates trained via stochastic gradient descent. We evaluate BayesNF against prominent statistical and machine-learning baselines, showing considerable improvements on diverse prediction problems from climate and public health datasets that contain tens to hundreds of thousands of measurements. The paper is accompanied with an open-source software package (https://github.com/google/bayesnf) that is easy-to-use and compatible with modern GPU and TPU accelerators on the JAX machine learning platform.
翻译:时空数据集由空间参考的时间序列构成,在空气污染监测、疾病追踪、云需求预测等众多科学和商业智能应用中普遍存在。随着现代数据集规模和复杂性的持续增长,亟需开发兼具灵活捕捉复杂时空动态能力与可扩展处理大规模预测问题能力的新型统计方法。本文提出贝叶斯神经场(BayesNF)——一种用于在时空域上推断丰富概率分布的通用统计模型,可应用于预测、插值和变差分析等数据分析任务。BayesNF通过层级贝叶斯推断实现稳健的不确定性量化,并集成新型深度神经网络架构进行高容量函数估计。通过一系列光滑可微变换定义先验分布,我们利用随机梯度下降训练变分学习代理,在大规模数据上进行后验推断。我们将BayesNF与主流统计学和机器学习基线方法进行对比,在包含数万至数十万测量数据的气候与公共卫生数据集上,展示了其在多样化预测问题中的显著性能提升。本文附有开源软件包(https://github.com/google/bayesnf),该软件包易于使用,且兼容基于JAX机器学习平台的现代GPU和TPU加速器。