Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $\gamma$ such that the winning percentage is approximately ${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$. One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then use the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is consistently better than earlier models over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood.
翻译:比尔·詹姆斯的毕氏公式数十年来在仅凭极少量数据估算棒球队胜率方面表现出色:若球队平均得分与失分分别记为${\rm RS}$和${\rm RA}$,则存在某个$\gamma$使得胜率近似为${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$。该公式的一个重要应用是评价不同球员对球队的价值,因为它使我们能够估算给定得分产出增加时可能多赢的比赛场次。我们总结了该领域的先前研究,并扩展了米勒的早期理论模型(该模型假设得分分布源自形状参数相同的独立威布尔分布,且已被观察到能很好地描述实际得分数据)。本文假设得分与失分服从形状参数未必相同的独立威布尔分布,进而采用矩估计法求解包含四个未知数的四方程系统。据此得到的预测胜率在最近30个MLB赛季(1994年至2023年)中始终优于早期模型。该方法的代价是我们不再拥有闭式解,而需计算两个威布尔分布的二维积分,并数值求解方程组;然而借助简单计算程序即可轻松完成这些计算,因此采用该框架替代最小二乘法或最大似然法完全值得。