Indexing and Partitioning the Spatial Linear Model for Large Data Sets

We consider four main goals when fitting spatial linear models: 1) estimating covariance parameters, 2) estimating fixed effects, 3) kriging (making point predictions), and 4) block-kriging (predicting the average value over a region). Each of these goals can present different challenges when analyzing large spatial data sets. Current research uses a variety of methods, including spatial basis functions (reduced rank), covariance tapering, etc, to achieve these goals. However, spatial indexing, which is very similar to composite likelihood, offers some advantages. We develop a simple framework for all four goals listed above by using indexing to create a block covariance structure and nearest-neighbor predictions while maintaining a coherent linear model. We show exact inference for fixed effects under this block covariance construction. Spatial indexing is very fast, and simulations are used to validate methods and compare to another popular method. We study various sample designs for indexing and our simulations showed that indexing leading to spatially compact partitions are best over a range of sample sizes, autocorrelation values, and generating processes. Partitions can be kept small, on the order of 50 samples per partition. We use nearest-neighbors for kriging and block kriging, finding that 50 nearest-neighbors is sufficient. In all cases, confidence intervals for fixed effects, and prediction intervals for (block) kriging, have appropriate coverage. Some advantages of spatial indexing are that it is available for any valid covariance matrix, can take advantage of parallel computing, and easily extends to non-Euclidean topologies, such as stream networks. We use stream networks to show how spatial indexing can achieve all four goals, listed above, for very large data sets, in a matter of minutes, rather than days, for an example data set.

翻译：我们考虑拟合空间线性模型时的四个主要目标：1）估计协方差参数，2）估计固定效应，3）克里金插值（点预测），4）块克里金插值（区域均值预测）。在分析大规模空间数据集时，每个目标都可能面临不同挑战。现有研究采用空间基函数（降秩法）、协方差削边等多种方法实现这些目标。然而，与复合似然法高度相似的空间索引方法展现出独特优势。我们通过索引构建块协方差结构及近邻预测，同时保持线性模型一致性，为上述四个目标建立了简洁框架。在块协方差构造下，我们证明了固定效应的精确推断方法。空间索引运算速度极快，通过仿真验证方法有效性并与其他主流方法进行对比。我们研究了多种索引抽样设计，结果表明：在样本量、自相关值和生成过程等不同条件下，空间紧致分区索引表现最佳。分区规模可保持较小量级（每分区约50个样本）。采用近邻法进行克里金插值和块克里金插值时，50个最近邻点足以满足需求。所有场景下，固定效应置信区间与（块）克里金预测区间均具有合理覆盖率。空间索引的优势在于：适用于任意有效协方差矩阵，可充分利用并行计算，且能轻松扩展到非欧几里得拓扑结构（如河流网络）。我们以河流网络为例，展示空间索引如何在数分钟内为大规模数据集实现上述四个目标，而传统方法则需要数天时间。