Causal structure learning (CSL) refers to the task of learning causal relationships from data. Advances in CSL now allow learning of causal graphs in diverse application domains, which has the potential to facilitate data-driven causal decision-making. Real-world CSL performance depends on a number of $\textit{context-specific}$ factors, including context-specific data distributions and non-linear dependencies, that are important in practical use-cases. However, our understanding of how to assess and select CSL methods in specific contexts remains limited. To address this gap, we present $\textit{CausalRegNet}$, a multiplicative effect structural causal model that allows for generating observational and interventional data incorporating context-specific properties, with a focus on the setting of gene perturbation experiments. Using real-world gene perturbation data, we show that CausalRegNet generates accurate distributions and scales far better than current simulation frameworks. We illustrate the use of CausalRegNet in assessing CSL methods in the context of interventional experiments in biology.
翻译:因果结构学习(CSL)指从数据中学习因果关系的任务。CSL的进展使得在不同应用领域学习因果图成为可能,这有助于促进数据驱动的因果决策。实际场景中的CSL性能取决于多种特定情境因素,包括情境特定的数据分布和非线性依赖关系,这些因素在实际应用中至关重要。然而,对于如何在特定情境下评估和选择CSL方法,我们的理解仍然有限。为填补这一空白,我们提出CausalRegNet——一种具有乘积效应的结构因果模型,该模型能够生成包含情境特定属性的观测数据与干预数据,并聚焦于基因扰动实验场景。利用真实世界的基因扰动数据,我们证明CausalRegNet生成的分布具有更高精度,且其扩展性远超现有仿真框架。我们通过生物学干预实验的案例,展示了CausalRegNet在评估CSL方法中的应用价值。