Learning causal directed acyclic graphs (DAGs) from data is complicated by a lack of identifiability and the combinatorial space of solutions. Recent work has improved tractability of score-based structure learning of DAGs in observational data, but is sensitive to the structure of the exogenous error variances. On the other hand, learning exogenous variance structure from observational data requires prior knowledge of structure. Motivated by new biological technologies that link highly parallel gene interventions to a high-dimensional observation, we present $\texttt{dotears}$ [doo-tairs], a scalable structure learning framework which leverages observational and interventional data to infer a single causal structure through continuous optimization. $\texttt{dotears}$ exploits predictable structural consequences of interventions to directly estimate the exogenous error structure, bypassing the circular estimation problem. We extend previous work to show, both empirically and analytically, that the inferences of previous methods are driven by exogenous variance structure, but $\texttt{dotears}$ is robust to exogenous variance structure. Across varied simulations of large random DAGs, $\texttt{dotears}$ outperforms state-of-the-art methods in structure estimation. Finally, we show that $\texttt{dotears}$ is a provably consistent estimator of the true DAG under mild assumptions.
翻译:从数据中学习因果有向无环图(DAG)因缺乏可辨识性及解的组合空间而复杂。近期研究提升了基于评分的结构学习在观测数据中的可处理性,但其对外生误差方差结构敏感。另一方面,从观测数据学习外生方差结构需要先验结构知识。受新型生物技术(将高度并行的基因干预与高维观测关联)启发,我们提出$\texttt{dotears}$ [doo-tairs],这是一个可扩展的结构学习框架,通过连续优化利用观测和干预数据推断单一因果结构。$\texttt{dotears}$利用干预的可预测结构后果直接估计外生误差结构,绕开了循环估计问题。我们扩展了先前研究,从实证和分析角度表明,先前方法的推断受外生方差结构驱动,而$\texttt{dotears}$对外生方差结构具有鲁棒性。在多种大规模随机DAG模拟中,$\texttt{dotears}$在结构估计上优于最先进方法。最后,我们证明$\texttt{dotears}$在温和假设下是真实DAG的可证明一致估计量。