Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.
翻译:扰动实验生成的数据将扰动与其引发的变化联系起来,因此包含了与众多生物发现任务相关的信息——从理解生物实体间关系到开发治疗方法。然而,这些数据涵盖多样化的扰动类型与检测指标,且实验结果对其生物背景的复杂依赖性使得跨实验整合洞察变得困难。本文提出大规模扰动模型(LPM),这是一种深度学习模型,通过将扰动、检测指标及背景解耦为独立维度,整合了多个异质性扰动实验。LPM在多项生物发现任务中均优于现有方法,包括预测未见实验的扰动后转录组、识别化学与遗传扰动间共享的分子作用机制,以及促进基因-基因相互作用网络的推断。