Constraint-based causal discovery methods leverage conditional independence tests to infer causal relationships in a wide variety of applications. Just as the majority of machine learning methods, existing work focuses on studying $\textit{independent and identically distributed}$ data. However, it is known that even with infinite i.i.d.$\ $ data, constraint-based methods can only identify causal structures up to broad Markov equivalence classes, posing a fundamental limitation for causal discovery. In this work, we observe that exchangeable data contains richer conditional independence structure than i.i.d.$\ $ data, and show how the richer structure can be leveraged for causal discovery. We first present causal de Finetti theorems, which state that exchangeable distributions with certain non-trivial conditional independences can always be represented as $\textit{independent causal mechanism (ICM)}$ generative processes. We then present our main identifiability theorem, which shows that given data from an ICM generative process, its unique causal structure can be identified through performing conditional independence tests. We finally develop a causal discovery algorithm and demonstrate its applicability to inferring causal relationships from multi-environment data. Our code and models are publicly available at: https://github.com/syguo96/Causal-de-Finetti
翻译:基于约束的因果发现方法利用条件独立性检验来推断各种应用中的因果关系。与大多数机器学习方法一样,现有工作主要研究$\textit{独立同分布}$数据。然而,众所周知,即使拥有无限量的i.i.d.$\ $数据,基于约束的方法也只能将因果结构识别到广泛的马尔可夫等价类,这构成了因果发现的一个根本性限制。在本工作中,我们观察到可交换数据比i.i.d.$\ $数据包含更丰富的条件独立性结构,并展示了如何利用这种更丰富的结构进行因果发现。我们首先提出因果德菲内蒂定理,该定理指出,具有某些非平凡条件独立性的可交换分布总可以表示为$\textit{独立因果机制(ICM)}$生成过程。随后,我们提出主要的可识别性定理,该定理表明,给定来自ICM生成过程的数据,其唯一的因果结构可以通过执行条件独立性检验来识别。最后,我们开发了一种因果发现算法,并展示了其在从多环境数据推断因果关系方面的适用性。我们的代码和模型已在以下网址公开:https://github.com/syguo96/Causal-de-Finetti