Additive spatial statistical models with weakly stationary process assumptions have become standard in spatial statistics. However, one disadvantage of such models is the computation time, which rapidly increases with the number of data points. The goal of this article is to apply an existing subsampling strategy to standard spatial additive models and to derive the spatial statistical properties. We call this strategy the "spatial data subset model" (SDSM) approach, which can be applied to big datasets in a computationally feasible way. Our approach has the advantage that one does not require any additional restrictive model assumptions. That is, computational gains increase as model assumptions are removed when using our model framework. This provides one solution to the computational bottlenecks that occur when applying methods such as Kriging to "big data". We provide several properties of this new spatial data subset model approach in terms of moments, sill, nugget, and range under several sampling designs. An advantage of our approach is that it subsamples without throwing away data, and can be implemented using datasets of any size that can be stored. We present the results of the spatial data subset model approach on simulated datasets, and on a large dataset consists of 150,000 observations of daytime land surface temperatures measured by the MODIS instrument onboard the Terra satellite.
翻译:具有弱平稳过程假设的加性空间统计模型已成为空间统计的标准方法。然而,此类模型的一个缺点是计算时间会随数据点数量的增加而急剧增长。本文旨在将现有子采样策略应用于标准空间加性模型,并推导其空间统计性质。我们将该策略称为"空间数据子集模型"(SDSM)方法,该方法可在大数据集上以计算可行的方式应用。本方法的优势在于无需引入任何额外的限制性模型假设。即,在使用本模型框架时,随着模型假设的减少,计算效率的提升将更加显著。这为解决克里金法等应用于"大数据"时出现的计算瓶颈提供了一种方案。我们在多种采样设计下,从矩、基台值、块金值和变程等方面,给出了这种新型空间数据子集模型方法的若干性质。本方法的一个优点在于,它在子采样过程中不会丢弃数据,并且可应用于任意可存储大小的数据集。我们展示了空间数据子集模型方法在模拟数据集上的结果,以及一个包含由Terra卫星搭载的MODIS仪器测量的150,000个白天陆地表面温度观测值的大型数据集上的结果。