Statistical modeling in presence of hierarchical data is a crucial task in Bayesian statistics. The Hierarchical Dirichlet Process (HDP) represents the utmost tool to handle data organized in groups through mixture modeling. Although the HDP is mathematically tractable, its computational cost is typically demanding, and its analytical complexity represents a barrier for practitioners. The present paper conceives a mixture model based on a novel family of Bayesian priors designed for multilevel data and obtained by normalizing a finite point process. A full distribution theory for this new family and the induced clustering is developed, including tractable expressions for marginal, posterior and predictive distributions. Efficient marginal and conditional Gibbs samplers are designed for providing posterior inference. The proposed mixture model overcomes the HDP in terms of analytical feasibility, clustering discovery, and computational time. The motivating application comes from the analysis of shot put data, which contains performance measurements of athletes across different seasons. In this setting, the proposed model is exploited to induce clustering of the observations across seasons and athletes. By linking clusters across seasons, similarities and differences in athlete's performances are identified.
翻译:在分层数据存在下的统计建模是贝叶斯统计中的关键任务。分层狄利克雷过程(HDP)是通过混合建模处理分组数据的最重要工具。尽管HDP在数学上易于处理,但其计算成本通常较高,且其分析复杂性对实践者构成了障碍。本文提出了一种基于新型贝叶斯先验族的混合模型,该先验族专为多层次数据设计,通过对有限点过程进行归一化得到。本文发展了该新族及诱导聚类的完整分布理论,包括边缘分布、后验分布和预测分布的可处理表达式。设计了高效的边缘和条件吉布斯采样器以进行后验推断。所提出的混合模型在分析可行性、聚类发现和计算时间方面均优于HDP。本研究的实际应用源于对铅球数据的分析,该数据包含运动员在不同赛季的表现测量值。在此背景下,所提出的模型被用于对跨赛季和跨运动员的观测值进行聚类。通过关联不同赛季的聚类,识别出运动员表现的相似性和差异性。