Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $γ\approx 2$ such that the winning percentage is approximately ${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$. One use case is to determine the value of potential signings to the team, as it allows us to estimate how many more wins one obtains over a season given an estimated change in run production and concession. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who assumed the home and away teams' runs arise from independent Weibull distributions with the same shape parameter $γ$; this has been observed to describe the observed run data well and yields a win probability equivalent to that of James' formula). We extend this work to model runs scored and allowed as being drawn from independent Weibull distributions with different shape parameters, and then consider the first and second moments to solve a system of four equations in the four unknowns. Doing so fits the training data better, yielding a higher winning percentage over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression for the win probability, but must evaluate a two-dimensional integral of Weibull distributions and numerically estimate the solutions to the system of equations. These are trivial to do with simple computational programs.
翻译:几十年来,比尔·詹姆斯的毕达哥拉斯公式在仅使用少量数据估计棒球队胜率方面表现出色:若将平均得分与平均失分分别记为${\rm RS}$与${\rm RA}$,则存在某个$γ\approx 2$使得胜率近似等于${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$。该公式的一个应用场景是评估潜在签约球员对球队的价值,因为它能让我们根据预估的得分与失分变化来推算整个赛季可增加的胜场数。本文总结了该领域的早期研究,并拓展了Miller先前提出的理论模型(该模型假设主客场球队的得分服从具有相同形状参数$γ$的独立威布尔分布;该假设被证实能很好地描述实际得分数据,且推导出的胜率公式与詹姆斯公式等价)。我们将此研究拓展为:将得分与失分建模为来自具有不同形状参数的独立威布尔分布,进而利用一阶矩和二阶矩构建四元方程组进行求解。该方法能更好地拟合训练数据,在最近30个MLB赛季(1994年至2023年)中获得了更高的胜率预测精度。代价是我们不再拥有胜率计算的闭式表达式,而必须通过二维威布尔分布积分进行求值,并对方程组解进行数值估计。这些计算均可通过简单的计算程序轻松实现。