Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.
翻译:近年来,空间混杂问题已成为空间统计学中的一个重要议题。然而,现有文献对其定义存在相互矛盾之处,且许多提出的定义并未涉及因果推断语境下对混杂问题的理解。我们将空间混杂定义为存在具有空间结构的未测量因果混杂因素。我们提出了一种因果推断框架,用于在空间混杂存在的情况下对连续暴露对结局的因果效应进行非参数识别。我们采用双机器学习(DML)方法,通过灵活模型将暴露变量和结局变量分别对混杂因素进行回归,从而获得具有良好稳健性和收敛速度的因果估计量,并证明该方法在空间依赖条件下具有一致性和渐近正态性。据我们所知,这是首个在识别和估计阶段均不依赖严格参数假设(如线性、效应同质性或高斯性)的空间混杂处理方法。我们在理论分析和模拟实验中展示了DML方法的优势,并将该方法与推理应用于加利福尼亚州孕期细颗粒物暴露对出生体重影响的研究。