Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com.
翻译:因果发现,即从数据中学习因果图,通常是识别和估计因果效应的第一步,这是众多科学领域的关键需求。因果发现面临两大挑战:有限的数据导致统计检验中的错误,以及学习任务的巨大计算复杂性。本文基于并扩展了我们之前的四篇出版物(Mokhtarian 等人,2021;Akbari 等人,2021;Mokhtarian 等人,2022,2023a)。这些工作引入了可移除变量的概念,这些变量是为因果发现目的而可以递归移除的唯一变量。可移除变量的存在和识别使得递归方法用于因果发现成为可能,这是一种有前途的解决方案,通过逐步减小问题规模来帮助应对上述挑战。这种减小不仅最小化了每个条件独立性测试中的条件集,从而减少了错误,还显著减少了所需条件独立性测试的数量。这些方法的最坏情况性能几乎达到了下界。在本文中,我们为所提出的算法提供了一个统一框架,并补充了额外的细节和增强以实现连贯的呈现。还包含一份全面的文献综述,将我们方法的计算复杂度与现有方法进行比较,展示了其最先进的效率。本文的另一个贡献是发布了 RCD,这是一个高效实现这些算法的 Python 包。该包专为有兴趣在实际场景中应用这些方法的实践者和研究人员设计。该包可在 github.com/ban-epfl/rcd 获取,并在 rcdpackage.com 提供全面的文档。