The standard mixture modeling framework has been widely used to study heterogeneous populations, by modeling them as being composed of a finite number of homogeneous sub-populations. However, the standard mixture model assumes that each data point belongs to one and only one mixture component, or cluster, but when data points have fractional membership in multiple clusters this assumption is unrealistic. It is in fact conceptually very different to represent an observation as partly belonging to multiple groups instead of belonging to one group with uncertainty. For this purpose, various soft clustering approaches, or individual-level mixture models, have been developed. In this context, Heller et al (2008) formulated the Bayesian partial membership model (PM) as an alternative structure for individual-level mixtures, which also captures partial membership in the form of attribute-specific mixtures. Our work proposes using the PM for soft clustering of count data arising in football performance analysis and compares the results with those achieved with the mixed membership model and finite mixture model. Learning and inference are carried out using Markov chain Monte Carlo methods. The method is applied on Serie A football player data from the 2022/2023 football season, to estimate the positions on the field where the players tend to play, in addition to their primary position, based on their playing style. The application of partial membership model to football data could have practical implications for coaches, talent scouts, team managers and analysts. These stakeholders can utilize the findings to make informed decisions related to team strategy, talent acquisition, and statistical research, ultimately enhancing performance and understanding in the field of football.
翻译:标准混合建模框架通过将异质总体建模为有限个同质子总体的组合,已被广泛用于研究异质总体。然而,标准混合模型假设每个数据点仅属于一个混合成分或聚类,当数据点以分数形式隶属于多个聚类时,这一假设并不符合实际。事实上,将观测值表示为部分隶属于多个群体,与表示其以不确定性隶属于单一群体,在概念上存在本质差异。为此,研究者们开发了多种软聚类方法或个体层面混合模型。在此背景下,Heller等人(2008)提出了贝叶斯部分隶属模型作为个体层面混合模型的替代框架,该模型通过属性特定混合的形式同样能够捕捉部分隶属关系。本研究提出将PM模型应用于足球表现分析中产生的计数数据软聚类,并将结果与混合隶属模型及有限混合模型的结果进行比较。学习与推断过程采用马尔可夫链蒙特卡洛方法实现。该方法应用于2022/2023赛季意甲联赛足球运动员数据,旨在根据球员比赛风格,除主要位置外,进一步估计其在场上倾向活动的区域。将部分隶属模型应用于足球数据可为教练、球探、球队经理和分析师带来实际应用价值。这些利益相关者可利用研究结果在球队策略制定、人才引进和统计研究等方面做出科学决策,最终提升足球领域的表现水平与认知深度。