We consider the problem of learning a directed graph $G^\star$ from observational data. We assume that the distribution which gives rise to the samples is Markov and faithful to the graph $G^\star$ and that there are no unobserved variables. We do not rely on any further assumptions regarding the graph or the distribution of the variables. In particular, we allow for directed cycles in $G^\star$ and work in the fully non-parametric setting. Given the set of conditional independence statements satisfied by the distribution, we aim to find a directed graph which satisfies the same $d$-separation statements as $G^\star$. We propose a hybrid approach consisting of two steps. We first find a partially ordered partition of the vertices of $G^\star$ by optimizing a certain score in a greedy fashion. We prove that any optimal partition uniquely characterizes the Markov equivalence class of $G^\star$. Given an optimal partition, we propose an algorithm for constructing a graph in the Markov equivalence class of $G^\star$ whose strongly connected components correspond to the elements of the partition, and which are partially ordered according to the partial order of the partition. Our algorithm comes in two versions -- one which is provably correct and another one which performs fast in practice.
翻译:我们考虑从观测数据中学习有向图$G^\star$的问题。我们假设生成样本的分布关于图$G^\star$满足马尔可夫性和忠实性,且不存在未观测变量。我们不依赖任何关于图或变量分布的进一步假设,特别地,我们允许$G^\star$中存在有向环,并在完全非参数设定下开展工作。给定分布满足的条件独立性声明集合,我们旨在找到一个与$G^\star$满足相同$d$-分离声明的有向图。我们提出一种包含两步的混合方法:首先,通过贪心方式优化特定得分,找到$G^\star$顶点的一个偏序划分。我们证明任何最优划分唯一刻画了$G^\star$的马尔可夫等价类。给定最优划分后,我们提出一种算法,用于构造$G^\star$马尔可夫等价类中的图,该图的强连通分量对应于划分中的元素,并按照划分的偏序关系进行排序。我们的算法包含两个版本——一个具有可证明的正确性,另一个在实践中运行速度快。