We present a consistent and highly scalable local approach to learn the causal structure of a linear Gaussian polytree using data from interventional experiments with known intervention targets. Our methods first learn the skeleton of the polytree and then orient its edges. The output is a CPDAG representing the interventional equivalence class of the polytree of the true underlying distribution. The skeleton and orientation recovery procedures we use rely on second order statistics and low-dimensional marginal distributions. We assess the performance of our methods under different scenarios in synthetic data sets and apply our algorithm to learn a polytree in a gene expression interventional data set. Our simulation studies demonstrate that our approach is fast, has good accuracy in terms of structural Hamming distance, and handles problems with thousands of nodes.
翻译:我们提出一种一致且高度可扩展的局部方法,利用已知干预目标的干预实验数据来学习线性高斯多叉树的因果结构。该方法首先学习多叉树的骨架,然后对其边进行定向。输出结果是一个CPDAG(部分有向无环图),表示真实分布下多叉树的干预等价类。我们采用的骨架与定向恢复方法依赖于二阶统计量和低维边际分布。通过在合成数据集的不同场景下评估方法性能,并将算法应用于基因表达干预数据集以学习多叉树结构。仿真研究表明,该方法速度快,在结构汉明距离上具有良好的准确性,且能够处理包含数千个节点的问题。