New biological assays like Perturb-seq link highly parallel CRISPR interventions to a high-dimensional transcriptomic readout, providing insight into gene regulatory networks. Causal gene regulatory networks can be represented by directed acyclic graph (DAGs), but learning DAGs from observational data is complicated by lack of identifiability and a combinatorial solution space. Score-based structure learning improves practical scalability of inferring DAGs. Previous score-based methods are sensitive to error variance structure; on the other hand, estimation of error variance is difficult without prior knowledge of structure. Accordingly, we present $\texttt{dotears}$ [doo-tairs], a continuous optimization framework which leverages observational and interventional data to infer a single causal structure, assuming a linear Structural Equation Model (SEM). $\texttt{dotears}$ exploits structural consequences of hard interventions to give a marginal estimate of exogenous error structure, bypassing the circular estimation problem. We show that $\texttt{dotears}$ is a provably consistent estimator of the true DAG under mild assumptions. $\texttt{dotears}$ outperforms other methods in varied simulations, and in real data infers edges that validate with higher precision and recall than state-of-the-art methods through differential expression tests and high-confidence protein-protein interactions.
翻译:诸如Perturb-seq等新型生物检测技术将高度并行的CRISPR干预与高维转录组学读数相链接,为基因调控网络提供了洞察。因果基因调控网络可由有向无环图(DAG)表示,但从观测数据中学习DAG因缺乏可辨识性和组合解空间而变得复杂。基于分数的结构学习提升了推断DAG的实际可扩展性。以往的分数基方法对误差方差结构敏感;另一方面,在没有先验结构知识的情况下估计误差方差十分困难。为此,我们提出$\texttt{dotears}$ [doo-tairs],一个连续优化框架,该框架利用观测和干预数据在线性结构方程模型(SEM)假设下推断单一因果结构。$\texttt{dotears}$ 利用硬干预的结构性后果,给出外生误差结构的边际估计,从而绕过了循环估计问题。我们证明,在温和假设下,$\texttt{dotears}$ 是真实DAG的可证明一致估计量。$\texttt{dotears}$ 在多种仿真中优于其他方法,并在真实数据中通过差异表达测试和高置信度蛋白质-蛋白质相互作用推断出比现有最优方法具有更高精确率和召回率的边。