We address the problem of integrating data from multiple, possibly biased, observational and interventional studies, to eventually compute counterfactuals in structural causal models. We start from the case of a single observational dataset affected by a selection bias. We show that the likelihood of the available data has no local maxima. This enables us to use the causal expectation-maximisation scheme to compute approximate bounds for partially identifiable counterfactual queries, which are the focus of this paper. We then show how the same approach can solve the general case of multiple datasets, no matter whether interventional or observational, biased or unbiased, by remapping it into the former one via graphical transformations. Systematic numerical experiments and a case study on palliative care show the effectiveness and accuracy of our approach, while hinting at the benefits of integrating heterogeneous data to get informative bounds in case of partial identifiability.
翻译:我们解决了整合来自多个可能带有偏倚的观测研究和干预研究数据的问题,最终旨在计算结构因果模型中的反事实。我们从单个受选择偏倚影响的观测数据集的情况入手。我们证明,现有数据的似然函数没有局部极大值。这使我们能够使用因果期望最大化方案来计算部分可识别的反事实查询的近似边界,这也是本文的重点。然后,我们展示了如何通过图形变换将相同的方法映射到前一种情况,从而解决多个数据集的总体情况——无论是干预性还是观测性、有偏还是无偏。系统的数值实验和一项关于姑息治疗的案例研究证明了我们方法的有效性和准确性,同时暗示了整合异质数据以在部分可识别情况下获得信息性边界的益处。