Consistency of common spatial estimators under spatial confounding

This paper addresses the asymptotic performance of popular spatial regression estimators on the task of estimating the linear effect of an exposure on an outcome under "spatial confounding" -- the presence of an unmeasured spatially-structured variable influencing both the exposure and the outcome. The existing literature on spatial confounding is informal and inconsistent; this paper is an attempt to bring clarity through rigorous results on the asymptotic bias and consistency of estimators from popular spatial regression models. We consider two data generation processes: one where the confounder is a fixed function of space and one where it is a random function (i.e., a stochastic process on the spatial domain). We first show that the estimators from ordinary least squares (OLS) and restricted spatial regression are asymptotically biased under spatial confounding. We then prove a novel main result on the consistency of the generalized least squares (GLS) estimator using a Gaussian process (GP) covariance matrix in the presence of spatial confounding under in-fill (fixed domain) asymptotics. The result holds under very general conditions -- for any exposure with some non-spatial variation (noise), for any spatially continuous confounder, using any choice of Mat\'ern or square exponential Gaussian process covariance used to construct the GLS estimator, and without requiring Gaussianity of errors. Finally, we prove that spatial estimators from GLS, GP regression, and spline models that are consistent under confounding by a fixed function will also be consistent under confounding by a random function. We conclude that, contrary to much of the literature on spatial confounding, traditional spatial estimators are capable of estimating linear exposure effects under spatial confounding in the presence of some noise in the exposure. We support our theoretical arguments with simulation studies.

翻译：本文探讨了在"空间混杂"——即存在一个未测量的、具有空间结构的变量同时影响暴露和结果的情况下，流行空间回归估计量在估计暴露对结果的线性效应这一任务中的渐近性能。现有关于空间混杂的文献是非正式且不一致的；本文试图通过对流行空间回归模型中估计量的渐近偏差和一致性给出严格结果，以澄清这一问题。我们考虑了两种数据生成过程：一种是混杂变量是空间的固定函数，另一种是混杂变量是随机函数（即空间域上的随机过程）。我们首先证明了普通最小二乘（OLS）和限制性空间回归的估计量在空间混杂下是渐近有偏的。然后，我们在填充（固定域）渐近框架下，证明了关于使用高斯过程（GP）协方差矩阵的广义最小二乘（GLS）估计量在存在空间混杂时的一致性的一个新颖的主要结果。该结果在非常一般的条件下成立——适用于任何具有某些非空间变异（噪声）的暴露、任何空间连续的混杂变量、使用任何用于构建GLS估计量的Matérn或平方指数高斯过程协方差选择，并且不要求误差的高斯性。最后，我们证明了，在固定函数混杂下具有一致性的GLS、GP回归和样条模型的空间估计量，在随机函数混杂下也将具有一致性。我们的结论是，与许多关于空间混杂的文献观点相反，传统的空间估计量能够在暴露存在一定噪声的情况下，估计空间混杂下的线性暴露效应。我们通过模拟研究支持了我们的理论论证。