Constraint-based causal discovery algorithms learn part of the causal graph structure by systematically testing conditional independences observed in the data. These algorithms, such as the PC algorithm and its variants, rely on graphical characterizations of the so-called equivalence class of causal graphs proposed by Pearl. However, constraint-based causal discovery algorithms struggle when data is limited since conditional independence tests quickly lose their statistical power, especially when the conditioning set is large. To address this, we propose using conditional independence tests where the size of the conditioning set is upper bounded by some integer $k$ for robust causal discovery. The existing graphical characterizations of the equivalence classes of causal graphs are not applicable when we cannot leverage all the conditional independence statements. We first define the notion of $k$-Markov equivalence: Two causal graphs are $k$-Markov equivalent if they entail the same conditional independence constraints where the conditioning set size is upper bounded by $k$. We propose a novel representation that allows us to graphically characterize $k$-Markov equivalence between two causal graphs. We propose a sound constraint-based algorithm called the $k$-PC algorithm for learning this equivalence class. Finally, we conduct synthetic, and semi-synthetic experiments to demonstrate that the $k$-PC algorithm enables more robust causal discovery in the small sample regime compared to the baseline algorithms.
翻译:基于约束的因果发现算法通过系统性地检验数据中观测到的条件独立性来学习部分因果图结构。诸如PC算法及其变体等算法依赖于Pearl提出的所谓因果图等价类的图论刻画。然而,基于约束的因果发现算法在数据有限时面临困境,因为条件独立性检验会迅速丧失统计效力,尤其是当条件集较大时。为解决此问题,我们提出使用条件集大小受某个整数$k$上界约束的条件独立性检验,以实现稳健的因果发现。当无法利用所有条件独立性陈述时,现有的因果图等价类图论刻画不再适用。我们首先定义$k$-马尔可夫等价的概念:若两个因果图蕴含相同的条件独立约束(其中条件集大小以$k$为上界),则称它们为$k$-马尔可夫等价。我们提出一种新颖的表示方法,能够从图论角度刻画两个因果图之间的$k$-马尔可夫等价。我们提出一种名为$k$-PC算法的基于约束的可靠算法,用于学习该等价类。最后,我们通过合成实验与半合成实验证明,与基线算法相比,$k$-PC算法在小样本条件下能够实现更稳健的因果发现。