Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments, learning causally consistent representations at different resolutions, and linking interventions across multiple SCMs. In this work, we propose COTA, the first method to learn abstraction maps from observational and interventional data without assuming complete knowledge of the underlying SCMs. In particular, we introduce a multi-marginal Optimal Transport (OT) formulation that enforces do-calculus causal constraints, together with a cost function that relies on interventional information. We extensively evaluate COTA on synthetic and real world problems, and showcase its advantages over non-causal, independent and aggregated COTA formulations. Finally, we demonstrate the efficiency of our method as a data augmentation tool by comparing it against the state-of-the-art CA learning framework, which assumes fully specified SCMs, on a real-world downstream task.
翻译:因果抽象理论通过定义不同粒度层级的结构因果模型之间的映射,建立了关联它们的正式准则。这些映射对于现实世界的挑战具有重大意义,例如综合来自多个实验环境的因果证据、学习不同分辨率下的因果一致性表征,以及跨多个结构因果模型连接干预措施。本文提出COTA,这是首个无需假设完全了解底层结构因果模型,即可从观测数据和干预数据中学习抽象映射的方法。具体而言,我们引入了一种多边际最优传输公式,该公式强制实施do-演算因果约束,并采用基于干预信息的成本函数。我们在合成问题和现实问题上广泛评估了COTA,展示了其相对于非因果、独立和聚合的COTA公式的优势。最后,通过将COTA与假设完全指定结构因果模型的最先进因果抽象学习框架在现实下游任务中进行比较,我们证明了该方法作为数据增强工具的高效性。