Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $\gamma$ such that the winning percentage is approximately ${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$. One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then use the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is consistently better than earlier models over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood.
翻译:比尔·詹姆斯的毕达哥拉斯公式数十年来仅凭极少量数据就能出色地估计棒球队的胜率:若平均得分数与失分数分别记为${\rm RS}$和${\rm RA}$,则存在某个$\gamma$使得胜率近似为${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$。该公式的一个重要应用是衡量不同球员对球队的价值——它能让我们估计出当得分产出固定增加时所能额外获得的胜场数。我们总结了该领域的早期研究,并扩展了米勒先前的理论模型(该模型将得分分布视为来自独立且形状参数相同的威布尔分布,这一假设已被观察到能很好地描述实际得分数据)。本研究将得分数与失分数建模为来自独立威布尔分布(形状参数不必相同)的抽样,进而采用矩估计法求解一个含四个未知数的方程组。该方法得出的预测胜率在最近30个MLB赛季(1994年至2023年)中始终优于先前模型。虽然这需要付出较小代价——我们不再拥有闭式表达式,而必须对两个威布尔分布的二维积分进行数值计算,并对方程组的解进行数值估计——但借助简单的计算程序即可轻松实现,因此完全值得采用这一框架,并避免实施最小二乘法或最大似然法带来的问题。