Spatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in many scientific and business-intelligence applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As modern datasets continue to increase in size and complexity, there is a growing need for new statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle large prediction problems. This work presents the Bayesian Neural Field (BayesNF), a domain-general statistical model for inferring rich probability distributions over a spatiotemporal domain, which can be used for data-analysis tasks including forecasting, interpolation, and variography. BayesNF integrates a novel deep neural network architecture for high-capacity function estimation with hierarchical Bayesian inference for robust uncertainty quantification. By defining the prior through a sequence of smooth differentiable transforms, posterior inference is conducted on large-scale data using variationally learned surrogates trained via stochastic gradient descent. We evaluate BayesNF against prominent statistical and machine-learning baselines, showing considerable improvements on diverse prediction problems from climate and public health datasets that contain tens to hundreds of thousands of measurements. The paper is accompanied with an open-source software package (https://github.com/google/bayesnf) that is easy-to-use and compatible with modern GPU and TPU accelerators on the JAX machine learning platform.
翻译:时空数据集由空间参考时间序列构成,广泛存在于众多科学与商业智能应用中,例如空气污染监测、疾病追踪和云需求预测。随着现代数据集的规模和复杂性持续增长,亟需开发兼具灵活性与可扩展性的新型统计方法,以捕捉复杂的时空动态并处理大规模预测问题。本研究提出了贝叶斯神经场(BayesNF)——一种领域通用的统计模型,用于推断时空域上的丰富概率分布,可应用于包括预测、插值和变异函数分析在内的数据分析任务。BayesNF将用于高容量函数估计的新型深度神经网络架构,与用于稳健不确定性量化的分层贝叶斯推断相结合。通过定义一系列平滑可微变换构成的先验分布,我们使用基于随机梯度下降训练的变分学习代理模型,在大规模数据上进行后验推断。我们将BayesNF与主流的统计和机器学习基线方法进行比较,在包含数万至数十万测量值的气候与公共卫生数据集上的多种预测问题中,均显示出显著改进。本文同时提供了开源软件包(https://github.com/google/bayesnf),该软件包易于使用,且兼容JAX机器学习平台上的现代GPU与TPU加速器。