Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

from arxiv, Presented at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (Post-NeurIPS fixes: cosmetic fixes, updated references, added simulation to appendix)

Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.

翻译：干预条件下的观测生成模型近年来已成为机器学习与科学领域的热点课题。例如在药物发现中，需要建模不同干预对细胞的影响以表征未知的生物学作用机制。我们提出稀疏加性机制偏移变分自编码器（Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE），将组合性、解耦性与可解释性整合于扰动模型中。SAMS-VAE将扰动样本的潜在状态建模为局部潜在变量（捕获样本特异性变异）与稀疏全局潜在变量（表征潜在干预效应）之和。关键之处在于，SAMS-VAE通过稀疏化个体干预对应的全局潜在变量，识别出解耦且可灵活组合的干预特异性潜在子空间。我们使用两个流行的单细胞测序数据集，在多项任务上对SAMS-VAE进行了定量与定性评估。为衡量干预特异性模型特性，我们还引入了一个基于平均处理效应且关联后验预测检验的扰动模型评估框架。在分布内与分布外任务（包括资源匮乏条件下的组合推理任务）的泛化性能方面，SAMS-VAE均优于对比模型，并生成与已知生物学机制高度相关的可解释潜在结构。结果表明，SAMS-VAE为机器学习驱动的科学发现建模工具库增添了富有价值的新成员。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/