Spatial confounding is a fundamental issue in regression models for spatially indexed data. It arises because spatial random effects, included to approximate unmeasured spatial variation, are typically not independent of the covariates in the model. This can lead to significant bias in covariate effect estimates. Despite extensive research, it is still a topic of much confusion with sometimes puzzling and seemingly contradictory results. In this paper we develop a broad theoretical framework that brings mathematical clarity to the mechanisms of spatial confounding, providing explicit and interpretable analytical expressions for the resulting bias. From these, we see that it is a problem directly linked to spatial smoothing, and we can identify exactly how the features of the model and the data generation process affect the size and occurrence of bias. We also use our framework to understand and generalise some of the main results on spatial confounding in the past, including suggested methods for bias adjustment. Thus, our comprehensive and mathematically explicit approach clears up existing confusion and, indeed, demystifies the issue of spatial confounding.
翻译:空间混杂是空间索引数据回归模型中的一个基本问题。该问题的产生源于:模型中用于近似未测量空间变异性的空间随机效应,通常与协变量不独立。这一现象可能导致协变量效应估计出现显著偏差。尽管已有大量研究,空间混杂仍是一个充满困惑的议题,常出现令人费解甚至看似矛盾的结果。本文构建了一个宽泛的理论框架,在数学上澄清了空间混杂的作用机制,并为由此产生的偏差提供了明确可解析的表达式。基于这些表达式,我们发现该问题直接与空间平滑相关,且能精确识别模型特征与数据生成过程如何影响偏差的大小及出现概率。我们进一步利用该框架理解并归纳了以往关于空间混杂的主要研究成果,包括建议的偏差调整方法。因此,这一全面且数学明确的方法厘清了现有困惑,真正揭开了空间混杂问题的神秘面纱。