As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection.
翻译:由于因果真实标注极其罕见,因果发现算法通常仅通过模拟数据进行评估。这令人担忧,因为模拟数据往往反映了对生成过程中噪声分布、模型类别等方面的先验假设。本文提出一种无需真实因果标签即可证伪因果发现算法输出的新方法。我们的核心洞见在于:统计学习追求数据点子集间的稳定性,而因果学习则应追求变量子集间的稳定性。基于这一洞见,该方法依赖于在不同变量子集上学习到的因果图之间的兼容性概念。我们证明,检测到的不兼容性可因假设违反或有限样本误差而证伪错误推断的因果关系。尽管通过此类兼容性测试仅是良好性能的必要条件,但我们论证当兼容性蕴含联合分布的强约束时,它能为因果模型提供有力证据。实验表明,不兼容性检测可有效辅助因果模型选择。