Finite mixture models are flexible methods that are commonly used for model-based clustering. A recent focus in the model-based clustering literature is to highlight the difference between the number of components in a mixture model and the number of clusters. The number of clusters is more relevant from a practical stand point, but to date, the focus of prior distribution formulation has been on the number of components. In light of this, we develop a finite mixture methodology that permits eliciting prior information directly on the number of clusters in an intuitive way. This is done by employing an asymmetric Dirichlet distribution as a prior on the weights of a finite mixture. Further, a penalized complexity motivated prior is employed for the Dirichlet shape parameter. We illustrate the ease to which prior information can be elicited via our construction and the flexibility of the resulting induced prior on the number of clusters. We also demonstrate the utility of our approach using numerical experiments and two real world data sets.
翻译:有限混合模型是一种灵活的方法,常用于基于模型的聚类分析。近期的模型聚类文献关注于区分混合模型中的分量数量与聚类数量。从实际应用角度看,聚类数量更为重要,但迄今为止,先验分布的设计焦点仍集中在分量数量上。基于此,我们提出一种有限混合模型方法,能够以直观方式直接对聚类数量引入先验信息。该方法通过将非对称狄利克雷分布作为有限混合模型中权重的先验来实现。进一步地,我们采用基于惩罚复杂性的先验对狄利克雷形状参数进行建模。本文展示了通过所提结构引入先验信息的简便性,以及由此产生的聚类数量先验分布的灵活性。数值实验和两个真实数据集的应用验证了该方法的实用性。