A mixture of multivariate Poisson-log normal factor analyzers is introduced by imposing constraints on the covariance matrix, which resulted in flexible models for clustering purposes. In particular, a class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced. Variational Gaussian approximation is used for parameter estimation, and information criteria are used for model selection. The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies. Using real and simulated data, the models are shown to give favourable clustering performance. The GitHub R package for this work is available at https://github.com/anjalisilva/mixMPLNFA and is released under the open-source MIT license.
翻译:通过对方差协方差矩阵施加约束,本文引入了一种多元泊松-对数正态因子分析器混合模型,从而构建出适用于聚类的灵活模型体系。具体而言,基于因子分析器混合模型框架,本文提出了八类简约混合模型。采用变分高斯近似进行参数估计,并利用信息准则进行模型选择。所提出模型在RNA测序研究产生的离散数据聚类场景中得到应用。基于真实数据与模拟数据的实验表明,该模型具有优异的聚类性能。本工作的GitHub R包发布于https://github.com/anjalisilva/mixMPLNFA,采用开源MIT许可证。