Modelling intransitivity in pairwise comparisons with application to baseball data

The seminal Bradley-Terry model exhibits transitivity, i.e., the property that the probabilities of player A beating B and B beating C give the probability of A beating C, with these probabilities determined by a skill parameter for each player. Such transitive models do not account for different strategies of play between each pair of players, which gives rise to {\it intransitivity}. Various intransitive parametric models have been proposed but they lack the flexibility to cover the different strategies across $n$ players, with the $O(n^2)$ values of intransitivity modelled using $O(n)$ parameters, whilst they are not parsimonious when the intransitivity is simple. We overcome their lack of adaptability by allocating each pair of players to one of a random number of $K$ intransitivity levels, each level representing a different strategy. Our novel approach for the skill parameters involves having the $n$ players allocated to a random number of $A<n$ distinct skill levels, to improve efficiency and avoid false rankings. Although we may have to estimate up to $O(n^2)$ unknown parameters for $(A,K)$ we anticipate that in many practical contexts $A+K < n$. Using a Bayesian hierarchical model, $(A,K)$ are treated as unknown, and inference is conducted via a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. Our semi-parametric model, which gives the Bradley-Terry model when $(A=n-1, K=0)$, is shown to have an improved fit relative to the Bradley-Terry, and the existing intransitivity models, in out-of-sample testing when applied to simulated and American League baseball data. Supplementary materials for the article are available online.

翻译：经典Bradley-Terry模型具有传递性，即选手A战胜B的概率与B战胜C的概率共同决定了A战胜C的概率，且这些概率由每位选手的技能参数决定。此类传递模型无法解释不同选手对之间存在的不同策略，由此引发{\it非传递性}。现有多种非传递参数化模型被提出，但它们在涵盖$n$名选手间的不同策略方面缺乏灵活性——用$O(n)$个参数建模$O(n^2)$个非传递性取值，且当非传递性结构简单时不够简约。我们通过将每对选手分配至随机数量的$K$个非传递性水平（每个水平代表一种不同策略）来克服这一适应性不足。在技能参数建模方面，我们提出新方法：将$n$名选手分配至随机数量的$A < n$个不同技能水平，以提高效率并避免虚假排名。尽管对于$(A,K)$可能需要估计高达$O(n^2)$个未知参数，但我们预期在实际场景中$A+K < n$。通过贝叶斯分层模型，将$(A,K)$视为未知参数，并采用可逆跳转马尔可夫链蒙特卡洛（RJMCMC）算法进行推断。该半参数模型在$(A=n-1, K=0)$时退化为Bradley-Terry模型，在仿真数据和美国联盟棒球数据的样本外测试中，其拟合优度优于Bradley-Terry模型及现有非传递性模型。本文补充材料可在线获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日