Additive spatial statistical models with weakly stationary process assumptions have become standard in spatial statistics. However, one disadvantage of such models is the computation time, which rapidly increases with the number of datapoints. The goal of this article is to apply an existing subsampling strategy to standard spatial additive models and to derive the spatial statistical properties. We call this strategy the ``spatial data subset model'' approach, which can be applied to big datasets in a computationally feasible way. Our approach has the advantage that one does not require any additional restrictive model assumptions. That is, computational gains increase as model assumptions are removed when using our model framework. This provides one solution to the computational bottlenecks that occur when applying methods such as Kriging to ``big data''. We provide several properties of this new spatial data subset model approach in terms of moments, sill, nugget, and range under several sampling designs. The biggest advantage of our approach is that it is scalable to a dataset of any size that can be stored. We present the results of the spatial data subset model approach on simulated datasets, and on a large dataset consists of 150,000 observations of daytime land surface temperatures measured by the MODIS instrument onboard the Terra satellite.
翻译:在空间统计中,基于弱平稳过程假设的可加性空间统计模型已成为标准方法。然而,此类模型的缺点在于计算时间会随数据点数量的增加而急剧增长。本文旨在将现有子抽样策略应用于标准空间可加性模型,并推导其空间统计性质。我们将该策略称为“空间数据子集模型”方法,该方法能以计算可行方式应用于大规模数据集。本方法的优势在于无需引入任何额外的限制性模型假设——即在使用我们提出的模型框架时,计算效率会随着模型假设的减少而提升。这为解决克里金法等应用于“大数据”时出现的计算瓶颈提供了一种方案。我们给出了该新方法在不同抽样设计下的矩、基台值、块金值和变程等若干性质。本方法最大的优势在于其可扩展至任意可存储规模的数据集。我们通过模拟数据集以及由Terra卫星搭载的MODIS传感器测量的150,000个白天地表温度观测值组成的大规模数据集,展示了空间数据子集模型方法的应用结果。