Learning directed acyclic graphs (DAGs) to identify causal relations underlying observational data is crucial but also poses significant challenges. Recently, topology-based methods have emerged as a two-step approach to discovering DAGs by first learning the topological ordering of variables and then eliminating redundant edges, while ensuring that the graph remains acyclic. However, one limitation is that these methods would generate numerous spurious edges that require subsequent pruning. To overcome this limitation, in this paper, we propose an improvement to topology-based methods by introducing limited time series data, consisting of only two cross-sectional records that need not be adjacent in time and are subject to flexible timing. By incorporating conditional instrumental variables as exogenous interventions, we aim to identify descendant nodes for each variable. Following this line, we propose a hierarchical topological ordering algorithm with conditional independence test (HT-CIT), which enables the efficient learning of sparse DAGs with a smaller search space compared to other popular approaches. The HT-CIT algorithm greatly reduces the number of edges that need to be pruned. Empirical results from synthetic and real-world datasets demonstrate the superiority of the proposed HT-CIT algorithm.
翻译:学习有向无环图(DAGs)以识别观测数据背后的因果关联至关重要,但也面临巨大挑战。近年来,基于拓扑的方法作为一种两步式DAG发现技术崭露头角:首先学习变量的拓扑排序,随后消除冗余边,同时确保图的无环性。然而,这些方法的局限性在于会产生大量需要后续剪枝的虚假边。为克服这一局限,本文提出对基于拓扑的方法进行改进,引入仅包含两个无需时间相邻且时间灵活的横截面记录构成的有限时间序列数据。通过将条件工具变量作为外生干预,我们旨在识别每个变量的后代节点。沿着这一思路,我们提出了一种融合条件独立性检验的分层拓扑排序算法(HT-CIT),相较于其他主流方法,该算法能以更小的搜索空间高效学习稀疏DAG。HT-CIT算法大幅减少了需要剪枝的边数量。在合成数据集和真实数据集上的实验结果证明了所提出的HT-CIT算法的优越性。