Causal discovery methods based on the PC algorithm are proven to be sound if all structural assumptions are fulfilled and all conditional independence tests are correct. This idealized setting is rarely given in real data. In this work, we first analyze how local errors can propagate throughout the output graph of a PC-based method, highlighting how consequential seemingly innocuous errors can become. Next, we introduce coherency scores to find assumption violations and small sample errors in the absence of a ground truth. These scores do not require statistical tests beyond those already executed by the causal discovery algorithm. Errors detected by our approach extend the set of errors that can be detected by comparable existing methods. We place our computationally cheap global error detection and quantification scores as a bridge between computationally expensive global answer-set-programming-based methods and less expensive local error detection methods. The scores are analyzed on simulated and real-world datasets.
翻译:基于PC算法的因果发现方法在满足所有结构假设且所有条件独立性检验正确时,被证明是可靠的。然而,真实数据中极少能达到这一理想化设定。本研究首先分析局部误差如何在基于PC方法的输出图中传播,凸显看似无害的误差可能产生重大后果。随后,我们引入一致性分数,用于在缺乏真实基准的情况下检测假设违反与小样本误差。这些分数无需使用因果发现算法已执行的统计检验之外的额外统计检验。我们的方法所检测的误差,拓展了现有可比方法可检测的误差集合。我们将计算成本低廉的全局误差检测与量化分数,置于计算成本高昂的全局基于答案集编程的方法与计算成本较低的局部误差检测方法之间。这些分数在模拟数据集和真实世界数据集上进行了分析。