Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, recent results proved posterior inconsistency of the number of clusters when the true number of components is finite, for the Dirichlet process and Pitman--Yor process mixture models. We extend these results to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations thereof. The latter include the Dirichlet multinomial process, the recently proposed Pitman-Yor, and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced for the Dirichlet process can be extended to more general models and provides a consistent method to estimate the number of components.
翻译:贝叶斯非参数混合模型常用于复杂数据建模。尽管这些模型非常适用于密度估计,但最新研究证明当真实组分数量有限时,基于狄利克雷过程与皮特曼-约尔过程的混合模型会出现聚类数目后验不一致性。我们将这些结论扩展至其他贝叶斯非参数先验,例如吉布斯型过程及其有限维表示。后者包括狄利克雷多项过程、近期提出的皮特曼-约尔多项过程以及归一化广义伽马多项过程。我们证明基于这些过程的混合模型同样存在聚类数目不一致性问题,并探讨了可能的解决方案。值得注意的是,我们展示了针对狄利克雷过程提出的后处理算法可推广至更一般的模型,为估计组分数量提供了一致性方法。