Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.
翻译:从独立同分布观测数据中进行因果发现通常被认为是不适定的。我们证明,若能够获取结构因果模型所诱导的分布,并辅以(最佳情况下)仅两个在噪声统计特性上存在充分差异的环境数据,则唯一因果图是可识别的。值得注意的是,这是文献中首个保证在恒定环境数量及任意非线性机制下完整恢复因果图的结果。我们的唯一约束条件是噪声项服从高斯分布,但我们也提出了放松该限制的潜在方案。我们进一步扩展了独立成分分析与因果发现之间著名的对偶性:近期研究已表明,非线性ICA可在至少与源数量相等的多环境条件下求解。本文证明,在可获取的辅助信息少得多的情况下,因果发现同样能够实现这一目标。