Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper, we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distribution can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithmic convergence rates of estimating mixing measures via deconvolution.
翻译:混合模型广泛应用于异质数据总体的建模。标准的混合建模方法假设混合成分采用参数核函数形式。在许多应用中,对潜在子总体分布进行参数假设可能不符合实际,这促使了对混合成分本身进行非参数建模的需求。本文采用贝叶斯非参数建模方法,研究具有非参数混合成分的有限混合模型。特别地,我们假设数据总体由潜在成分分布的有限混合生成,其中每个成分被赋予贝叶斯非参数先验(如狄利克雷过程混合)。我们提出了可识别各混合成分分布的条件,并建立了数据总体密度及潜在混合成分密度的后验收缩行为。我们开发了一种高效的MCMC算法进行后验推断,并通过模拟研究和真实数据示例证明:该方法能够有效学习潜在子总体概率分布的复杂形式。理论上,成分密度的后验收缩速率接近多项式阶,这相较于通过解卷积估计混合测度所获得的对数收敛速率有显著提升。