Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique ($s$) of the given MEC. In the worst case, we show a lower bound of $\exp(\Omega(s))$ independence tests. We then give an algorithm that resolves the task with $\exp(O(s))$ tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning.
翻译:理解变量之间的因果关系是众多科学领域具有广泛影响的基本问题。尽管大量研究致力于从数据中学习因果图,但其互补概念——因果关系的测试——在很大程度上仍未得到探索。学习涉及从观测数据中恢复底层因果图的马尔可夫等价类(MEC),而测试则对应以下关键问题:给定一个特定的MEC和来自某个因果图的观测数据,我们能否确定数据生成的因果图是否属于该给定的MEC?我们通过建立所需条件独立性测试数量的界限,探索了基于约束的测试方法。我们的界限以给定MEC的最大无向团的大小($s$)表示。在最坏情况下,我们证明了独立性测试的下界为$\exp(\Omega(s))$。随后,我们给出了一种算法,该算法使用$\exp(O(s))$次测试完成任务,与下界相匹配。与学习问题(其中算法使用的独立性测试数量通常与最大入度呈指数关系)相比,这表明测试相对更容易。具体而言,在具有高入度和小团大小的图中,测试所需的独立性测试数量呈指数级减少。此外,利用DAG关联多面体,我们提供了测试与学习的几何解释,并讨论了我们的测试结果如何辅助学习。