Mixtures of factor analysers (MFA) models represent a popular tool for finding structure in data, particularly high-dimensional data. While in most applications the number of clusters, and especially the number of latent factors within clusters, is mostly fixed in advance, in the recent literature models with automatic inference on both the number of clusters and latent factors have been introduced. The automatic inference is usually done by assigning a nonparametric prior and allowing the number of clusters and factors to potentially go to infinity. The MCMC estimation is performed via an adaptive algorithm, in which the parameters associated with the redundant factors are discarded as the chain moves. While this approach has clear advantages, it also bears some significant drawbacks. Running a separate factor-analytical model for each cluster involves matrices of changing dimensions, which can make the model and programming somewhat cumbersome. In addition, discarding the parameters associated with the redundant factors could lead to a bias in estimating cluster covariance matrices. At last, identification remains problematic for infinite factor models. The current work contributes to the MFA literature by providing for the automatic inference on the number of clusters and the number of cluster-specific factors while keeping both cluster and factor dimensions finite. This allows us to avoid many of the aforementioned drawbacks of the infinite models. For the automatic inference on the cluster structure, we employ the dynamic mixture of finite mixtures (MFM) model. Automatic inference on cluster-specific factors is performed by assigning an exchangeable shrinkage process (ESP) prior to the columns of the factor loading matrices. The performance of the model is demonstrated on several benchmark data sets as well as real data applications.
翻译:混合因子分析模型(MFA)是发现数据结构(尤其是高维数据)的流行工具。在大多数应用中,聚类数量以及特别是聚类内潜在因子数量通常事先固定,而近期文献已引入能够自动推断聚类数量和潜在因子数量的模型。自动推断通常通过分配非参数先验并允许聚类数量和因子数量趋向无穷大来实现。MCMC估计通过自适应算法进行,在该算法中,与冗余因子相关的参数会随着链的移动而被丢弃。虽然这种方法具有明显优势,但也存在一些显著缺陷。为每个聚类运行独立的因子分析模型会涉及维度变化的矩阵,这可能使模型和编程变得繁琐。此外,丢弃与冗余因子相关的参数可能导致聚类协方差矩阵估计出现偏差。最后,无限因子模型的识别问题仍然存在。本研究为MFA文献做出贡献,通过保持聚类和因子维度有限,同时实现聚类数量和聚类特定因子数量的自动推断。这使我们能够避免上述无限模型的许多缺陷。对于聚类结构的自动推断,我们采用动态有限混合模型(MFM)。对于聚类特定因子的自动推断,我们通过为因子载荷矩阵的列分配可交换收缩过程(ESP)先验来实现。该模型的性能在多个基准数据集以及实际数据应用中得到了验证。