Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.
翻译:近年来,基于干预条件下观测数据的生成模型成为机器学习与科学领域的研究热点。例如在药物发现中,需要建模不同干预措施对细胞的影响,以表征未知的生物作用机制。我们提出稀疏加性机制转移变分自编码器(Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE),将组合性、解耦性与可解释性融合于扰动模型中。SAMS-VAE将扰动样本的隐状态建模为局部隐变量(捕获样本特异性变异)与稀疏全局隐变量(表征干预效应)之和。关键之处在于,SAMS-VAE通过稀疏化各干预对应的全局隐变量,识别出解耦的、可灵活组合的干预特异性隐子空间。我们利用两个常用单细胞测序数据集,在多项任务上对SAMS-VAE进行定量与定性评估。为衡量扰动特异性模型属性,我们还引入基于平均处理效应并与后验预测检验关联的扰动模型评估框架。在分布内与分布外任务(包括资源匮乏条件下的组合推理任务)的泛化能力上,SAMS-VAE均优于同类模型,并生成与已知生物机制高度相关的可解释隐结构。研究结果表明,SAMS-VAE为机器学习驱动的科学发现提供了有价值的建模工具。