Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which we assume the effects of drug perturbation on cellular states follow linear additivity. Next, we introduced the cycle consistency constraints to enforce that initial cellular state subjected to drug perturbations would produce the perturbed cellular responses, and, conversely, removal of drug perturbation from the perturbed cellular states would restore the initial cellular states. The cycle consistency constraints and linear modeling in latent space enable to learn interpretable and transferable drug perturbation representations, so that our model can predict cellular response to unseen drugs. We validated our model on three different types of datasets, including bulk transcriptional responses, bulk proteomic responses, and single-cell transcriptional responses to drug perturbations. The experimental results show that our model achieves better performance than existing state-of-the-art methods.
翻译:表型筛选在识别细胞活性化合物方面备受关注。细胞群或单细胞的转录组与蛋白质组图谱是细胞对扰动响应的信息丰富的表型指标。本文提出了一种基于编码器-解码器架构的深度学习框架,该框架将初始细胞状态映射到潜在空间,在该空间中我们假设药物扰动对细胞状态的影响遵循线性可加性。接着,我们引入循环一致性约束,确保初始细胞状态经药物扰动后能产生受扰动的细胞响应,反之,从受扰动状态中移除药物扰动后能恢复初始细胞状态。潜在空间中的循环一致性约束与线性建模使得模型能够学习可解释且可迁移的药物扰动表征,从而预测模型对未知药物的细胞响应。我们在三种不同类型的数据集上验证了模型性能,包括批量转录响应、批量蛋白质组响应以及单细胞转录响应。实验结果表明,我们的模型性能优于现有最先进的方法。