Bill James' Pythagorean formula has for decades done an excellent job estimating a team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $\gamma$ such that the winning percentage is approximately ${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$. One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then using the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is often better than earlier models. This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood.
翻译:比尔·詹姆斯的毕达哥拉斯公式数十年来能够仅凭极少数据出色地估算球队胜率:若平均得分与失分分别记为${\rm RS}$和${\rm RA}$,则存在某个$\gamma$使得胜率近似为${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$。该公式的重要推论之一是能够确定不同球员对球队的价值,因为它使我们能够估算给定得分增长下的额外胜场数。我们总结了前人相关研究,并扩展了米勒的理论模型(其将得分分布假设为具有相同形状参数的独立威布尔分布,该假设已证实能良好描述实际得分数据)。现我们将得分与失分数建模为来自形状参数未必相同的独立威布尔分布,并利用矩估计法求解四元方程组。该方法得出的预测胜率通常优于先前的模型。代价是失去了闭合表达式,需对两个威布尔分布进行二维积分并数值求解方程组;但借助简单计算程序即可轻松实现,因此采用该框架并避免实施最小二乘法或极大似然估计法是十分值得的。