Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.
翻译:从独立同分布观测数据中进行因果发现通常被认为是病态的。我们证明,如果能够获取结构因果模型所诱发的分布,以及(在最佳情况下)来自噪声统计特性充分不同的\textit{仅两个}环境的额外数据,则唯一因果图是可识别的。值得注意的是,这是文献中首个保证在恒定环境数量下、使用任意非线性机制恢复完整因果图的结果。我们唯一的约束是噪声项的高斯性;然而,我们提出了放松这一要求的可行方法。此外,我们扩展了独立成分分析(ICA)与因果发现之间众所周知的二元性:近期进展表明,非线性ICA可以通过多个环境(至少与源数量相当)求解;我们证明,在仅需较少辅助信息的情况下,因果发现也能达到同样的效果。