Practical and ethical constraints often require the use of observational data for causal inference, particularly in medicine and social sciences. Yet, observational datasets are prone to confounding, potentially compromising the validity of causal conclusions. While it is possible to correct for biases if the underlying causal graph is known, this is rarely a feasible ask in practical scenarios. A common strategy is to adjust for all available covariates, yet this approach can yield biased treatment effect estimates, especially when post-treatment or unobserved variables are present. We propose RAMEN, an algorithm that produces unbiased treatment effect estimates by leveraging the heterogeneity of multiple data sources without the need to know or learn the underlying causal graph. Notably, RAMEN achieves doubly robust identification: it can identify the treatment effect whenever the causal parents of the treatment or those of the outcome are observed, and the node whose parents are observed satisfies an invariance assumption. Empirical evaluations on synthetic and real-world datasets show that our approach outperforms existing methods.
翻译:在实际应用中,特别是医学和社会科学领域,由于实践和伦理约束,通常需要使用观测数据进行因果推断。然而,观测数据集容易受到混杂因素的影响,可能损害因果结论的有效性。虽然如果已知底层因果图,可以校正偏差,但在实际场景中这很少可行。一种常见策略是对所有可用协变量进行调整,但这种方法可能产生有偏的处理效应估计,尤其是在存在处理后变量或未观测变量的情况下。我们提出了RAMEN算法,该算法通过利用多个数据源的异质性来生成无偏的处理效应估计,而无需知晓或学习底层因果图。值得注意的是,RAMEN实现了双重稳健识别:只要处理变量的因果父节点或结果变量的因果父节点被观测到,且其父节点被观测到的节点满足不变性假设,该算法就能识别处理效应。在合成数据集和真实数据集上的实证评估表明,我们的方法优于现有方法。