The intricacies inherent in contemporary real datasets demand more advanced statistical models to effectively address complex challenges. In this article we delve into problems related to identifying clusters across related groups, when additional covariate information is available. We formulate a novel Bayesian nonparametric approach based on mixture models, integrating ideas from the hierarchical Dirichlet process and "single-atoms" dependent Dirichlet process. The proposed method exhibits exceptional generality and flexibility, accommodating both continuous and discrete covariates through the utilization of appropriate kernel functions. We construct a robust and efficient Markov chain Monte Carlo (MCMC) algorithm involving data augmentation to tackle the intractable normalized weights. The versatility of the proposed model extends our capability to discern the relationship between covariates and clusters. Through testing on both simulated and real-world datasets, our model demonstrates its capacity to identify meaningful clusters across groups, providing valuable insights for a spectrum of applications.
翻译:当代真实数据集的复杂性要求更先进的统计模型以有效应对复杂挑战。本文深入探讨了在可获得额外协变量信息的情况下,跨相关群体识别聚类的问题。我们基于混合模型提出了一种新颖的贝叶斯非参数方法,该方法融合了分层狄利克雷过程与“单原子”依赖狄利克雷过程的思想。所提出的方法展现出卓越的通用性和灵活性,通过使用适当的核函数,能够同时处理连续和离散协变量。我们构建了一个稳健高效的马尔可夫链蒙特卡洛(MCMC)算法,该算法结合数据增强技术以处理难以处理的归一化权重。该模型的通用性扩展了我们识别协变量与聚类之间关系的能力。通过在模拟和真实数据集上的测试,我们的模型证明了其跨群体识别有意义聚类的能力,为一系列应用提供了有价值的见解。