In this era of big data, all scientific disciplines are evolving fast to cope up with the enormity of the available information. So is statistics, the queen of science. Big data are particularly relevant to spatio-temporal statistics, thanks to much-improved technology in satellite based remote sensing and Geographical Information Systems. However, none of the existing approaches seem to meet the simultaneous demand of reality emulation and cheap computation. In this article, with the Levy random fields as the starting point, e construct a new Bayesian nonparametric, nonstationary and nonseparable dynamic spatio-temporal model with the additional realistic property that the lagged spatio-temporal correlations converge to zero as the lag tends to infinity. Although our Bayesian model seems to be intricately structured and is variable-dimensional with respect to each time index, we are able to devise a fast and efficient parallel Markov Chain Monte Carlo (MCMC) algorithm for Bayesian inference. Our simulation experiment brings out quite encouraging performance from our Bayesian Levy-dynamic approach. We finally apply our Bayesian Levy-dynamic model and methods to a sea surface temperature dataset consisting of 139,300 data points in space and time. Although not big data in the true sense, this is a large and highly structured data by any standard. Even for this large and complex data, our parallel MCMC algorithm, implemented on 80 processors, generated 110,000 MCMC realizations from the Levy-dynamic posterior within a single day, and the resultant Bayesian posterior predictive analysis turned out to be encouraging. Thus, it is not unreasonable to expect that with significantly more computing resources, it is feasible to analyse terabytes of spatio-temporal data with our new model and methods.
翻译:在大数据时代,所有科学领域都在快速发展以应对海量可用信息的挑战。作为科学之王的统计学亦不例外。得益于卫星遥感和地理信息系统技术的显著进步,大数据与时空统计学的关联尤为密切。然而,现有方法似乎均无法同时满足现实模拟与高效计算的双重要求。本文以莱维随机场为起点,构建了一种新的贝叶斯非参数、非平稳且不可分离的动态时空模型,该模型具备滞后时空相关性随滞后趋于无穷而收敛至零的附加现实特性。尽管我们的贝叶斯模型结构复杂且具有随时间索引变化的维度,我们仍能设计出快速高效的并行马尔可夫链蒙特卡罗(MCMC)算法进行贝叶斯推断。模拟实验表明我们的贝叶斯莱维动态方法具有令人鼓舞的性能。最终,我们将贝叶斯莱维动态模型与方法应用于包含139,300个时空数据点的海面温度数据集。尽管这并非严格意义上的大数据,但按任何标准衡量这都是一个规模庞大且高度结构化的数据集。即使对此类大型复杂数据,我们在80个处理器上运行的并行MCMC算法,仅用单日便从莱维动态后验分布中生成110,000个MCMC实现,所得贝叶斯后验预测分析结果令人振奋。因此,我们有理由预期:在显著增加计算资源的情况下,使用我们的新模型与方法分析太字节级别的时空数据是切实可行的。