Learning the structure of causal directed acyclic graphs (DAGs) is useful in many areas of machine learning and artificial intelligence, with wide applications. However, in the high-dimensional setting, it is challenging to obtain good empirical and theoretical results without strong and often restrictive assumptions. Additionally, it is questionable whether all of the variables purported to be included in the network are observable. It is of interest then to restrict consideration to a subset of the variables for relevant and reliable inferences. In fact, researchers in various disciplines can usually select a set of target nodes in the network for causal discovery. This paper develops a new constraint-based method for estimating the local structure around multiple user-specified target nodes, enabling coordination in structure learning between neighborhoods. Our method facilitates causal discovery without learning the entire DAG structure. We establish consistency results for our algorithm with respect to the local neighborhood structure of the target nodes in the true graph. Experimental results on synthetic and real-world data show that our algorithm is more accurate in learning the neighborhood structures with much less computational cost than standard methods that estimate the entire DAG. An R package implementing our methods may be accessed at https://github.com/stephenvsmith/CML.
翻译:学习因果有向无环图(DAG)的结构在机器学习和人工智能的诸多领域具有广泛应用价值。然而,在高维设定下,若不施加通常较为严格且限制性强的假设,则难以获得良好的实证与理论结果。此外,网络中声称应包含的所有变量是否均可观测亦存疑。因此,将考量范围限制在变量的某个子集以进行相关且可靠的推断具有重要意义。事实上,不同学科的研究者通常能够为因果发现任务选择网络中的一组目标节点。本文提出了一种新的基于约束的方法,用于估计多个用户指定目标节点周围的局部结构,从而实现邻域间结构学习的协调。我们的方法能够在无需学习整个DAG结构的前提下促进因果发现。我们针对算法在真实图中目标节点的局部邻域结构方面建立了相合性结果。在合成数据与真实数据上的实验结果表明,相较于估计整个D图的标准方法,我们的算法能以低得多的计算成本更准确地学习邻域结构。实现本方法的R包可通过https://github.com/stephenvsmith/CML获取。