Incorporating Subsampling into Bayesian Models for High-Dimensional Spatial Data

Additive spatial statistical models with weakly stationary process assumptions have become standard in spatial statistics. However, one disadvantage of such models is the computation time, which rapidly increases with the number of datapoints. The goal of this article is to apply an existing subsampling strategy to standard spatial additive models and to derive the spatial statistical properties. We call this strategy the ``spatial data subset model'' approach, which can be applied to big datasets in a computationally feasible way. Our approach has the advantage that one does not require any additional restrictive model assumptions. That is, computational gains increase as model assumptions are removed when using our model framework. This provides one solution to the computational bottlenecks that occur when applying methods such as Kriging to ``big data''. We provide several properties of this new spatial data subset model approach in terms of moments, sill, nugget, and range under several sampling designs. The biggest advantage of our approach is that it is scalable to a dataset of any size that can be stored. We present the results of the spatial data subset model approach on simulated datasets, and on a large dataset consists of 150,000 observations of daytime land surface temperatures measured by the MODIS instrument onboard the Terra satellite.

翻译：在空间统计中，基于弱平稳过程假设的可加性空间统计模型已成为标准方法。然而，此类模型的缺点在于计算时间会随数据点数量的增加而急剧增长。本文旨在将现有子抽样策略应用于标准空间可加性模型，并推导其空间统计性质。我们将该策略称为“空间数据子集模型”方法，该方法能以计算可行方式应用于大规模数据集。本方法的优势在于无需引入任何额外的限制性模型假设——即在使用我们提出的模型框架时，计算效率会随着模型假设的减少而提升。这为解决克里金法等应用于“大数据”时出现的计算瓶颈提供了一种方案。我们给出了该新方法在不同抽样设计下的矩、基台值、块金值和变程等若干性质。本方法最大的优势在于其可扩展至任意可存储规模的数据集。我们通过模拟数据集以及由Terra卫星搭载的MODIS传感器测量的150,000个白天地表温度观测值组成的大规模数据集，展示了空间数据子集模型方法的应用结果。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日