Paired comparison models are useful for estimating latent abilities or preferences from binary outcomes, but maximum likelihood estimation can be unstable or fail when the comparison graph is disconnected or nearly separated. Ridge regularization addresses these difficulties by shrinking ability parameters toward a common center, but it can obscure the simple likelihood interpretation that makes Bradley-Terry and Thurstone-Mosteller models attractive to practitioners. This paper describes two data-augmentation perspectives on regularization. The first adds fractional pseudo-games between every pair of competitors. The second adds a fixed-strength phantom player and gives each real competitor a weighted pseudo-win and pseudo-loss against that player. Both approaches yield finite, shrunken estimates; the phantom-player construction also resolves the usual location nonidentifiability without an explicit linear constraint. For the Bradley-Terry model, the two augmentations lead to transparent penalty functions that can be compared directly with ridge penalties. An application to the 2025 Major League Baseball regular season illustrates that tuned pseudo-game and phantom-player regularization can closely reproduce ridge-regularized strength estimates while retaining an intuitive augmented-data representation.
翻译:配对比较模型通过二元结果估计潜在能力或偏好十分有效,但当比较图不连通或接近分离时,最大似然估计可能不稳定甚至失效。岭正则化通过将能力参数向共同中心收缩来解决这些问题,但会模糊使Bradley-Terry和Thurstone-Mosteller模型受实践者青睐的简单似然解释。本文描述了两种数据增强视角的正则化方法。第一种方法在每对竞争者之间添加分数伪博弈。第二种方法引入固定强度的虚设选手,并赋予每位真实竞争者与该虚设选手的加权伪胜和伪负。两种方法均能产生有限且收缩的估计;虚设选手构造还能在无显式线性约束情况下解决常见的位置不可识别问题。对于Bradley-Terry模型,这两种增强方法可推导出透明的惩罚函数,直接与岭惩罚函数进行比较。通过2025年美国职业棒球大联盟常规赛的应用案例表明,经调优的伪博弈和虚设选手正则化方法在保持直观增强数据表示的同时,能紧密复现岭正则化方法的能力估计结果。