Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.
翻译:当前深度潜变量模型的聚类先验要求预先定义聚类数量,且容易受到不良初始化的影响。解决这些缺陷将有助于基于深度学习的单细胞RNA测序分析同时完成数据整合与聚类。我们将VampPrior(Tomczak & Welling, 2018)扩展为狄利克雷过程高斯混合模型,从而提出一种新型深度潜变量模型先验——VampPrior混合模型(VMM)。我们提出一种在变分推断与经验贝叶斯之间交替进行的推断流程,以清晰区分变分参数与先验参数。将VMM用于变分自编码器时,在基准数据集上取得了极具竞争力的聚类性能。将VMM应用于单细胞RNA测序整合方法scVI(Lopez et al., 2018)后,该方法性能显著提升,并能自动将细胞排列成具有生物学意义的聚类。