Probabilistic modeling of multidimensional spatiotemporal data is critical to many real-world applications. As real-world spatiotemporal data often exhibits complex dependencies that are nonstationary and nonseparable, developing effective and computationally efficient statistical models to accommodate nonstationary/nonseparable processes containing both long-range and short-scale variations becomes a challenging task, in particular for large-scale datasets with various corruption/missing structures. In this paper, we propose a new statistical framework -- Bayesian Complementary Kernelized Learning (BCKL) -- to achieve scalable probabilistic modeling for multidimensional spatiotemporal data. To effectively characterize complex dependencies, BCKL integrates two complementary approaches -- kernelized low-rank tensor factorization and short-range spatiotemporal Gaussian Processes. Specifically, we use a multi-linear low-rank factorization component to capture the global/long-range correlations in the data and introduce an additive short-scale GP based on compactly supported kernel functions to characterize the remaining local variabilities. We develop an efficient Markov chain Monte Carlo (MCMC) algorithm for model inference and evaluate the proposed BCKL framework on both synthetic and real-world spatiotemporal datasets. Our experiment results show that BCKL offers superior performance in providing accurate posterior mean and high-quality uncertainty estimates, confirming the importance of both global and local components in modeling spatiotemporal data.
翻译:多维时空数据的概率建模对许多实际应用至关重要。由于现实中的时空数据通常表现出非平稳且不可分离的复杂依赖性,开发有效且计算高效的统计模型以容纳包含长程与短程变化的非平稳/不可分离过程成为一项具有挑战性的任务,尤其对于具有各种损坏/缺失结构的大规模数据集。本文提出了一种新的统计框架——贝叶斯互补核化学习(BCKL)——以实现对多维时空数据的可扩展概率建模。为有效表征复杂依赖性,BCKL融合了两种互补方法:核化低秩张量分解与短程时空高斯过程。具体而言,我们利用多线性低秩分解分量捕获数据中的全局/长程相关性,并引入基于紧支撑核函数的加性短程高斯过程以描述剩余的局部变异性。我们开发了一种高效的马尔可夫链蒙特卡洛(MCMC)算法用于模型推断,并在合成与真实时空数据集上评估了提出的BCKL框架。实验结果表明,BCKL在提供精确后验均值与高质量不确定性估计方面表现出优越性能,证实了全局与局部分量在建模时空数据中的重要性。