Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.
翻译:当前深度潜变量模型(DLVM)的聚类先验需要预先定义聚类数量,且易受不良初始化的影响。解决这些不足将使基于深度学习的单细胞RNA测序(scRNA-seq)分析能够同时进行整合与聚类,从而极大受益。我们将VampPrior(Tomczak & Welling, 2018)改编为狄利克雷过程高斯混合模型,由此得到VampPrior混合模型(VMM)——一种面向DLVM的新型先验。我们提出一种交替进行变分推断与经验贝叶斯的推理流程,以清晰区分变分参数与先验参数。将VMM应用于变分自编码器时,在基准数据集上取得了极具竞争力的聚类性能。用VMM增强scVI(Lopez et al., 2018)——一种流行的scRNA-seq整合方法——可显著提升其性能,并自动将细胞组织成具有生物学意义的聚类。