In recent years, the baseball statistic "Wins Above Replacement" (WAR) has emerged as one of the most popular evaluation metrics. It is different from fundamental counting statistics such as batting average, strikeouts, or home runs insofar as it is not readily observed and tabulated; WAR is an estimate of a parameter in a vaguely defined model and its attendant assumptions. Industry-standard models of WAR for starting pitchers from FanGraphs and Baseball Reference all assume that season-long averages are sufficient statistics for a pitcher's performance. This provides an invalid mathematical foundation for many reasons, especially because WAR is not linear with respect to any counting statistic; in particular, WAR must be a convex function of the number of runs allowed in a game. To repair this defect (among many others), we devise a new measure, Grid WAR (GWAR), which estimates a starting pitcher's WAR on a per-game basis. We then define a starting pitcher's seasonal GWAR as the sum of the GWAR of each of his games. Formulated this way, GWAR is indeed a convex function of runs allowed. We find that averaging pitcher performance over the course of an entire season tends to, in general, undervalue worse pitchers and overvalue better pitchers. This is because the convexity of GWAR diminishes the seasonal impact of any game in which a pitcher allows many runs. Moreover, we show that Grid WAR has predictive as well as historical value insofar as a pitcher's historical Grid WAR is better than WAR at predicting future performance. Finally, at https://gridwar.xyz we host a Shiny app which displays the Grid WAR results of each MLB game since 1952, including career, season, and game level results, which updates automatically every morning.
翻译:近年来,棒球统计指标“高于替代球员价值”(WAR)已成为最流行的评估指标之一。与打击率、三振数或本垒打等基础计数统计不同,WAR无法直接观测和统计;它是对一个定义模糊模型中某个参数及其附带假设的估计值。来自FanGraphs和Baseball Reference的行业标准先发投手WAR模型均假设整个赛季的平均值足以作为投手表现的充分统计量。这存在诸多数学基础缺陷,其中一个关键原因在于WAR与任何计数统计均不呈线性关系;特别是WAR必须是对单场失分数的凸函数。为修正这一缺陷(及其他多项问题),我们设计了一种新指标——网格WAR(GWAR),它以单场比赛为单位估算先发投手的WAR值。进而将先发投手的赛季GWAR定义为各场比赛GWAR的总和。按此方式构建的GWAR确实是对失分数的凸函数。我们发现,将投手表现按整个赛季平均化,通常会导致低估较差投手、高估优秀投手的倾向,这是因为GWAR的凸性降低了投手失分较多场次对赛季总影响的权重。此外,我们证明网格WAR兼具预测价值和历史价值——在预测未来表现方面,投手的历史网格WAR优于传统WAR。最后,我们在https://gridwar.xyz上部署了一款Shiny应用程序,可展示自1952年以来每场MLB比赛的网格WAR结果(含职业生涯、赛季及单场级别数据),该程序每日自动更新。