Comparing spatial data sets is a ubiquitous task in data analysis, however the presence of spatial autocorrelation means that standard estimates of variance will be wrong and tend to over-estimate the statistical significance of correlations and other observations. While there are a number of existing approaches to this problem, none are ideal, requiring detailed analytical calculations, which are hard to generalise or detailed knowledge of the data generating process, which may not be available. In this work we propose a resampling approach based on Tobler's Law. By resampling the data with fixed spatial autocorrelation, measured by Moran's I, we generate a more realistic null model. Testing on real and synthetic data, we find that, as long as the spatial autocorrelation is not too strong, this approach works just as well as if we knew the data generating process.
翻译:比较空间数据集是数据分析中的常见任务,然而空间自相关的存在会导致方差的标准估计出现偏差,并且倾向于高估相关性及其他观测指标的统计显著性。尽管已有多种方法应对此问题,但没有一种方法达到理想效果:它们要么需要难以推广的详细解析计算,要么需要可能无法获取的数据生成过程的详尽知识。本文提出了一种基于托布勒定律的重采样方法。通过以固定空间自相关(以莫兰指数测量)对数据进行重采样,我们生成了更现实的零模型。在真实数据和合成数据上进行测试后,我们发现:只要空间自相关强度不太大,该方法的效果与已知数据生成过程的情况相当。