Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
翻译:近年来,解决空间混杂问题已成为空间统计学的重要课题。然而,现有文献提供了相互矛盾的定义,且许多提出的定义未能从因果推断的角度处理混杂问题。我们将空间混杂定义为存在具有空间结构的未测量因果混杂因子。我们提出了一个因果推断框架,用于在存在空间混杂的情况下非参数识别连续暴露对结果的因果效应。我们提出双重机器学习(DML)方法,该方法通过使用灵活模型将暴露变量和结果变量分别对混杂因子进行回归,从而获得具有良好稳健性和收敛速率的因果估计量,并证明该方法在空间依赖条件下具有一致性和渐近正态性。据我们所知,这是第一种在识别和估计过程中均不依赖限制性参数假设(如线性、效应同质性或高斯性)的空间混杂处理方法。我们通过理论分析和模拟实验证明了DML方法的优势。我们将所提方法和推理应用于研究加利福尼亚州妊娠期间细颗粒物暴露对新生儿出生体重的影响。