Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose CD-UCT, a model-based reinforcement learning method for causal discovery based on tree search that builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce cycles, which enables deeper discrete search and sampling in DAG space. The proposed method can be applied broadly to causal Bayesian networks with both discrete and continuous random variables. We conduct a comprehensive evaluation on synthetic and real-world datasets, showing that CD-UCT substantially outperforms the state-of-the-art model-free reinforcement learning technique and greedy search, constituting a promising advancement for combinatorial methods.
翻译:识别因果结构是战略决策、生物学和经济学等多个领域的核心问题。本文提出CD-UCT——一种基于模型强化学习的因果发现方法,通过树搜索逐步构建有向无环图。我们形式化并证明了一种高效排除可能引发环路边界的算法正确性,从而在DAG空间中实现更深入的离散搜索与采样。该方法可广泛适用于包含离散和连续随机变量的因果贝叶斯网络。通过在合成数据集和真实世界数据集上的全面评估,我们证明CD-UCT显著优于当前最先进的无模型强化学习技术与贪心搜索,为组合优化方法提供了重要突破。