Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{Ω(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.
翻译:从观测数据中学习因果关系是一个基本问题,在众多领域具有广泛的应用。基于约束的方法通过执行条件独立性检验来推断潜在的因果结构。然而,现有算法(如著名的PC算法)需要执行大量的独立性检验,在最坏情况下,检验次数与因果图的最大度数呈指数关系。尽管进行了广泛研究,但在不增加额外假设的情况下,是否存在具有更优复杂度的算法仍不清楚。在此,我们建立了一种算法,其实现了$p^{\mathcal{O}(s)}$次检验的更优复杂度,其中$p$是图中的节点数,$s$表示底层本质图的最大无向团的大小。作为对该结果的补充,我们证明任何基于约束的算法至少必须执行$2^{Ω(s)}$次条件独立性检验,从而确立了所提出算法在所需条件独立性检验次数方面,达到指数最优(相差对数因子)。最后,我们通过模拟实验、半合成基因表达数据和真实世界数据验证了理论发现,展示了我们的算法在所需条件独立性检验次数方面相比现有方法的效率。