The standard mixture modelling framework has been widely used to study heterogeneous populations, by modelling them as being composed of a finite number of homogeneous sub-populations. However, the standard mixture model assumes that each data point belongs to one and only one mixture component, or cluster, but when data points have fractional membership in multiple clusters this assumption is unrealistic. It is in fact conceptually very different to represent an observation as partly belonging to multiple groups instead of belonging to one group with uncertainty. For this purpose, various soft clustering approaches, or individual-level mixture models, have been developed. In this context, Heller et al (2008) formulated the Bayesian partial membership model (PM) as an alternative structure for individual-level mixtures, which also captures partial membership in the form of attribute specific mixtures, but does not assume a factorization over attributes. Our work proposes using the PM for soft clustering of count data arising in football performance analysis and compare the results with those achieved with the mixed membership model and finite mixture model. Learning and inference are carried out using Markov chain Monte Carlo methods. The method is applied on Serie A football player data from the 2022/2023 football season, to estimate the positions on the field where the players tend to play, in addition to their primary position, based on their playing style. The application of partial membership model to football data could have practical implications for coaches, talent scouts, team managers and analysts. These stakeholders can utilize the findings to make informed decisions related to team strategy, talent acquisition, and statistical research, ultimately enhancing performance and understanding in the field of football.
翻译:标准混合建模框架通过将异质群体建模为由有限数量的同质子群体组成,已被广泛用于研究异质群体。然而,标准混合模型假设每个数据点属于且仅属于一个混合成分或聚类,当数据点在多个聚类中具有部分隶属度时,这一假设并不符合实际。实际上,将一个观测表示为部分属于多个群体,与表示其以不确定性属于单一群体,在概念上存在本质差异。为此,研究者开发了多种软聚类方法或个体层面混合模型。在此背景下,Heller等人(2008)提出了贝叶斯部分隶属度模型(PM)作为个体层面混合模型的替代结构,该模型同样以属性特定混合的形式捕捉部分隶属度,但未假设属性间的因子分解。本研究提出使用PM对足球表现分析中产生的计数数据进行软聚类,并将结果与混合隶属度模型和有限混合模型的结果进行比较。学习和推断采用马尔可夫链蒙特卡洛方法实现。该方法应用于2022/2023赛季意甲联赛的足球运动员数据,旨在根据球员的比赛风格,估计其在场上除主要位置外倾向于活动的区域。将部分隶属度模型应用于足球数据可能对教练、球探、球队经理和分析师具有实际意义。这些利益相关者可以利用研究结果,在球队策略、人才引进和统计研究等方面做出更明智的决策,最终提升足球领域的表现水平和理解深度。