Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
翻译:近来,解决空间混杂问题已成为空间统计学中的一个重要议题。然而,相关文献对其定义存在分歧,且许多既有定义并未触及因果推断中混杂概念的核心。我们将空间混杂界定为存在具备空间结构的未观测因果混杂因子。本文提出一套因果推断框架,用于在空间混杂存在时,对连续暴露因素对结局变量的因果效应进行非参数识别。我们设计了双机器学习(DML)方法,通过灵活模型对暴露变量和结局变量分别对混杂因子进行回归,从而得到具有良好稳健性和收敛速率的因果估计量,并证明该方法在空间依赖条件下具备一致性和渐近正态性。据我们所知,这是首个在识别和估计环节均无需依赖线性、效应同质性或高斯性等约束性参数假设的空间混杂处理方法。我们通过理论分析和模拟实验验证了DML方法的优势,并将该方法与推理逻辑应用于加利福尼亚州孕期细颗粒物暴露对出生体重影响的研究。