Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
翻译:近年来,解决空间混杂问题已成为空间统计学的主要议题。然而,现有文献提供了相互矛盾的定义,且许多定义并未涉及因果推断意义上的混杂问题。本文将空间混杂定义为存在具有空间结构的未测量因果混杂因素。我们提出了一种非参数识别连续暴露对结果因果效应的因果推断框架,该框架适用于存在空间混杂的情况。我们提出了双重机器学习(DML)方法:通过使用灵活模型分别对暴露变量和结果变量关于混杂因素进行回归,从而得到具有良好稳健性和收敛速度的因果估计量,并证明了该方法在空间依赖性下具有一致性和渐近正态性。据我们所知,这是首个在识别和估计过程中不依赖严格参数假设(如线性、效应同质性或高斯性)的空间混杂处理方法。我们通过理论分析和模拟实验展示了DML方法的优势,并将该方法和推理应用于加州孕期细颗粒物暴露对出生体重影响的研究中。