Predicting perturbation targets with causal differential networks

Rationally identifying variables responsible for changes to a biological system can enable myriad applications in disease understanding and cell engineering. From a causality perspective, we are given two datasets generated by the same causal model, one observational (control) and one interventional (perturbed). The goal is to isolate the subset of measured variables (e.g. genes) that were the targets of the intervention, i.e. those whose conditional independencies have changed. Knowing the causal graph would limit the search space, allowing us to efficiently pinpoint these variables. However, current algorithms that infer causal graphs in the presence of unknown intervention targets scale poorly to the hundreds or thousands of variables in biological data, as they must jointly search the combinatorial spaces of graphs and consistent intervention targets. In this work, we propose a causality-inspired approach for predicting perturbation targets that decouples the two search steps. First, we use an amortized causal discovery model to separately infer causal graphs from the observational and interventional datasets. Then, we learn to map these paired graphs to the sets of variables that were intervened upon, in a supervised learning framework. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets, each with thousands of measured variables. We also demonstrate significant improvements over six causal discovery algorithms in predicting intervention targets across a variety of tractable, synthetic datasets.

翻译：合理识别导致生物系统变化的变量，能够为疾病理解和细胞工程领域带来广泛应用。从因果关系的视角出发，我们获得由同一因果模型生成的两个数据集：一个为观测性数据（对照组），另一个为干预性数据（扰动组）。研究目标在于分离出作为干预靶点的被测变量子集（例如基因），即那些条件独立性已发生改变的变量。若已知因果图，则可限定搜索空间，从而有效定位这些变量。然而，现有在干预靶点未知情况下推断因果图的算法，在扩展到生物数据中成百上千个变量时效率低下，因为它们必须同时对图结构与一致干预靶点的组合空间进行联合搜索。本研究提出一种受因果启发的扰动靶点预测方法，将两个搜索步骤解耦。首先，我们使用摊销式因果发现模型分别从观测数据集和干预数据集中推断因果图；随后，在监督学习框架下，学习将这些成对因果图映射到受干预的变量集合。该方法在七个单细胞转录组学数据集（每个数据集包含数千个测量变量）的扰动建模任务中，始终优于基线模型。此外，我们在多种可处理的合成数据集上验证了该方法在预测干预靶点方面，相较于六种因果发现算法均有显著提升。