Confounding control is crucial and yet challenging for causal inference based on observational studies. Under the typical unconfoundness assumption, augmented inverse probability weighting (AIPW) has been popular for estimating the average causal effect (ACE) due to its double robustness in the sense it relies on either the propensity score model or the outcome mean model to be correctly specified. To ensure the key assumption holds, the effort is often made to collect a sufficiently rich set of pretreatment variables, rendering variable selection imperative. It is well known that variable selection for the propensity score targeted for accurate prediction may produce a variable ACE estimator by including the instrument variables. Thus, many recent works recommend selecting all outcome predictors for both confounding control and efficient estimation. This article shows that the AIPW estimator with variable selection targeted for efficient estimation may lose the desirable double robustness property. Instead, we propose controlling the propensity score model for any covariate that is a predictor of either the treatment or the outcome or both, which preserves the double robustness of the AIPW estimator. Using this principle, we propose a two-stage procedure with penalization for variable selection and the AIPW estimator for estimation. We show the proposed procedure benefits from the desirable double robustness property. We evaluate the finite-sample performance of the AIPW estimator with various variable selection criteria through simulation and an application.
翻译:混杂控制对于基于观察性研究的因果推断至关重要且具有挑战性。在典型的无混淆假设下,增广逆概率加权(AIPW)方法因其双重稳健性(即依赖倾向得分模型或结果均值模型中任一模型的正确设定)而被广泛用于估计平均因果效应(ACE)。为确保关键假设成立,研究者常会收集足够丰富的前处理变量,这使得变量选择成为必要。已知针对准确预测目标的倾向得分变量选择可能因纳入工具变量而导致ACE估计有偏。因此,近年诸多研究推荐选择所有结果预测变量以实现混杂控制与高效估计。本文证明,针对高效估计目标进行变量选择的AIPW估计量可能会丧失理想的双重稳健性。为此,我们提出控制倾向得分模型需纳入所有治疗或结果的预测变量(或两者兼有),以保持AIPW估计量的双重稳健性。基于这一原则,我们设计了一个两阶段流程,其中采用惩罚方法进行变量选择,并利用AIPW估计量进行参数估计。研究表明,该流程兼具理想的双重稳健性。通过模拟实验与应用实例,我们评估了采用不同变量选择准则的AIPW估计量的有限样本性能。