To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.
翻译:为了在包含众多变量的系统中进行准确预测、理解机制并设计干预措施,我们需要从大规模数据中学习因果图。然而,所有可能因果图的空间极为庞大,因此如何可扩展且准确地搜索最符合数据的因果图成为一项挑战。理论上,我们可以通过检验变量的条件独立性来大幅缩小搜索空间,甚至完全学习因果图。但判断两个变量在因果图中是否相邻可能需要指数级数量的检验。本文构建了一种可扩展且灵活的方法来评估两个变量在因果图中是否相邻,即可微邻接检验(DAT)。DAT 将指数级数量的检验替换为一个可证明等价的松弛问题,并通过训练两个神经网络来解决该问题。我们基于 DAT 构建了一种图学习方法 DAT-Graph,该方法还能从包含干预措施的数据中学习。DAT-Graph 能够以当前最优精度学习包含 1000 个变量的因果图。利用 DAT-Graph 学习得到的因果图,我们还构建了能够更准确预测大规模 RNA 测序数据中干预效果的模型。