Practical and ethical constraints often require the use of observational data for causal inference, particularly in medicine and social sciences. Yet, observational datasets are prone to confounding, potentially compromising the validity of causal conclusions. While it is possible to correct for biases if the underlying causal graph is known, this is rarely a feasible ask in practical scenarios. A common strategy is to adjust for all available covariates, yet this approach can yield biased treatment effect estimates, especially when post-treatment or unobserved variables are present. We propose RAMEN, an algorithm that produces unbiased treatment effect estimates by leveraging the heterogeneity of multiple data sources without the need to know or learn the underlying causal graph. Notably, RAMEN achieves doubly robust identification: it can identify the treatment effect whenever the causal parents of the treatment or those of the outcome are observed, and the node whose parents are observed satisfies an invariance assumption. Empirical evaluations on synthetic and real-world datasets show that our approach outperforms existing methods.
翻译:实际和伦理约束通常要求使用观测数据进行因果推断,尤其是在医学和社会科学领域。然而,观测数据集容易受到混杂因素的影响,可能损害因果结论的有效性。虽然已知潜在因果图时能够校正偏差,但在实际场景中这几乎不可行。一种常见策略是调整所有可用协变量,但这种方法可能产生有偏的处理效应估计,尤其在存在处理后变量或未观测变量时。我们提出RAMEN算法,该算法通过利用多个数据源的异质性,无需知道或学习潜在因果图即可产生无偏的处理效应估计。值得注意的是,RAMEN实现了双重稳健识别:当观测到处理或结果的因果父代变量,且父代被观测的节点满足不变性假设时,该算法能够识别处理效应。在合成数据集和真实数据集上的实证评估表明,我们的方法优于现有方法。