Statistical modelling in the presence of data organized in groups is a crucial task in Bayesian statistics. The present paper conceives a mixture model based on a novel family of Bayesian priors designed for multilevel data and obtained by normalizing a finite point process. In particular, the work extends the popular Mixture of Finite Mixture model to the hierarchical framework to capture heterogeneity within and between groups. A full distribution theory for this new family and the induced clustering is developed, including the marginal, posterior, and predictive distributions. Efficient marginal and conditional Gibbs samplers are designed to provide posterior inference. The proposed mixture model overcomes the Hierarchical Dirichlet Process, the utmost tool for handling multilevel data, in terms of analytical feasibility, clustering discovery, and computational time. The motivating application comes from the analysis of shot put data, which contains performance measurements of athletes across different seasons. In this setting, the proposed model is exploited to induce clustering of the observations across seasons and athletes. By linking clusters across seasons, similarities and differences in athletes' performances are identified.
翻译:在存在按组组织的数据的情况下进行统计建模是贝叶斯统计学中的一项关键任务。本文提出了一种基于新型贝叶斯先验族构建的混合模型,该先验族专为多层级数据设计,通过对有限点过程进行归一化获得。具体而言,本研究将流行的有限混合的混合模型扩展至层次化框架,以捕捉组内和组间的异质性。针对这一新先验族及其诱导的聚类结构,本文建立了完整的分布理论,包括边际分布、后验分布和预测分布。设计了高效的边际和条件吉布斯采样器以进行后验推断。所提出的混合模型在分析可行性、聚类发现和计算时间方面均优于处理多层级数据的顶尖工具——层次化狄利克雷过程。本研究的动机应用来自铅球数据分析,该数据集包含运动员在不同赛季的表现测量值。在此背景下,利用所提出的模型诱导跨赛季和跨运动员的观测值聚类。通过关联跨赛季的聚类,可以识别运动员表现中的相似性和差异性。