Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.

翻译：近年来，基于干预条件下观测数据的生成模型成为机器学习与科学领域的研究热点。例如在药物发现中，需要建模不同干预措施对细胞的影响，以表征未知的生物作用机制。我们提出稀疏加性机制转移变分自编码器（Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE），将组合性、解耦性与可解释性融合于扰动模型中。SAMS-VAE将扰动样本的隐状态建模为局部隐变量（捕获样本特异性变异）与稀疏全局隐变量（表征干预效应）之和。关键之处在于，SAMS-VAE通过稀疏化各干预对应的全局隐变量，识别出解耦的、可灵活组合的干预特异性隐子空间。我们利用两个常用单细胞测序数据集，在多项任务上对SAMS-VAE进行定量与定性评估。为衡量扰动特异性模型属性，我们还引入基于平均处理效应并与后验预测检验关联的扰动模型评估框架。在分布内与分布外任务（包括资源匮乏条件下的组合推理任务）的泛化能力上，SAMS-VAE均优于同类模型，并生成与已知生物机制高度相关的可解释隐结构。研究结果表明，SAMS-VAE为机器学习驱动的科学发现提供了有价值的建模工具。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/