Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long-term season-level performance insufficiently examined. To address these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, including improvements of more than 1.0 in K/9. The analyses also provided several practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.
翻译:尽管投球序列是棒球分析中的一个核心话题,但以往研究主要聚焦于优化单次打席内的最后一球,而对先前的布局投球及其对长期赛季水平表现的影响探讨不足。为解决这些问题,本研究利用MLB Statcast数据进行了反事实分析。训练了一个基于Transformer的机器学习模型,用于预测目标投球是否会导致界内击球或挥空出局。随后,通过用替代的投球类型和位置替换最后一球或先前的布局投球,同时保持周围情境信息不变,生成了反事实投球序列。最佳反事实选择被定义为那些最小化预测界内击球概率的序列,并使用将模型输出与赛季统计数据相关联的回归模型,估计了这些选择对投手赛季统计数据的影响。结果表明,对最后一球和布局投球的优化可能显著影响赛季水平表现,包括K/9的提升超过1.0。该分析还提供了几个实用见解,包括速度区间的有效位置、投球指令的重要性,以及通过中速投球扩展投球选择。这些发现定量支持了棒球中投球序列的战略重要性。