Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.
翻译:干预条件下的观测生成模型近年来已成为机器学习与科学领域的热点课题。例如在药物发现中,需要建模不同干预对细胞的影响以表征未知的生物学作用机制。我们提出稀疏加性机制偏移变分自编码器(Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE),将组合性、解耦性与可解释性整合于扰动模型中。SAMS-VAE将扰动样本的潜在状态建模为局部潜在变量(捕获样本特异性变异)与稀疏全局潜在变量(表征潜在干预效应)之和。关键之处在于,SAMS-VAE通过稀疏化个体干预对应的全局潜在变量,识别出解耦且可灵活组合的干预特异性潜在子空间。我们使用两个流行的单细胞测序数据集,在多项任务上对SAMS-VAE进行了定量与定性评估。为衡量干预特异性模型特性,我们还引入了一个基于平均处理效应且关联后验预测检验的扰动模型评估框架。在分布内与分布外任务(包括资源匮乏条件下的组合推理任务)的泛化性能方面,SAMS-VAE均优于对比模型,并生成与已知生物学机制高度相关的可解释潜在结构。结果表明,SAMS-VAE为机器学习驱动的科学发现建模工具库增添了富有价值的新成员。