Quantifying Officiating Impact in the NBA: A Referee Impact Metric Analysis Using ESPN Win-Probability Data

Over the past century, basketball analytics has moved from simple box-score rates toward complex context-aware measures that evaluate events by their expected effect on game outcomes. Officiating analysis has not made the same transition: existing work and public discussion still rely heavily on foul rates, foul differentials, reviewed late-game correctness labels, or team/player benefit from calls. This leaves an empirical gap because a low-leverage foul in a decided game should not be treated as equivalent to a whistle that materially shifts win probability in a close game. To address this gap, we introduce the Ref Impact Metric (RIM), a game-level statistic that aggregates the absolute win-probability movement attached to foul events, measuring the impact of each referee for each game. Using ESPN game-summary and win-probability data for NBA seasons 2021-2022 through 2024-2025, we show that RIM is empirically distinct from both foul volume and foul disparity, identify regular-season and postseason referee distributions, and examine home/away, team-side, and referee-team heterogeneity. We then use linear controls intentionally as stress tests: conditioning on home status, team, opponent, season, and postseason series state asks which descriptive outliers persist after basic contextual adjustment. The results show that several team-side and referee-team patterns remain visible after conditioning, but omitted-variable robustness diagnostics indicate that these patterns should be interpreted as observational screening signals rather than evidence of intent, misconduct, or whistle-level responsibility by any single official. Our contribution to the literature is foundational, and we emphasize that this framework should be tested with different win probability models and further causal inference.

翻译：过去一个世纪，篮球分析已从简单的技术统计比率转向复杂的上下文敏感指标，通过评估事件对比赛结果的预期影响来衡量其价值。然而裁判分析尚未经历同样的转型：现有研究和公开讨论仍严重依赖犯规次数、犯规差值、比赛末段关键判罚的正确性标签，或球队/球员在判罚中获得的收益。这造成了实证空白——在一场胜负已定的比赛中，低杠杆犯规不应与改变胶着比赛获胜概率的关键判罚等量齐观。为填补这一空白，我们引入裁判影响力指标（Ref Impact Metric, RIM），这是一种比赛层面统计量，通过累加犯规事件引发的绝对获胜概率变动，衡量每场比赛每位裁判的影响力。基于2021-2022至2024-2025赛季NBA的ESPN比赛总结与获胜概率数据，我们证明RIM在经验上独立于犯规总数和犯规差异，识别出常规赛与季后赛裁判分布，并考察主客场、球队侧及裁判-球队异质性。我们有意采用线性控制作为压力测试：通过控制主场状态、球队、对手、赛季及季后赛系列赛状态，检验哪些描述性离群值在基本背景调整后依然存在。结果显示，若干球队侧与裁判-球队模式在控制条件后依然可见，但遗漏变量稳健性诊断表明，这些模式应视为观察性筛查信号，而非任何单个裁判的意图、不当行为或哨声责任的证据。本研究为文献提供了基础性贡献，我们强调该框架需采用不同获胜概率模型和进一步因果推断方法进行验证。