Real-time probability forecasts for binary outcomes are routine in sports, online experimentation, medicine, and finance. Retrospective narratives, however, often hinge on pathwise extremes: for example, a forecast that reaches $90\%$ for an event that ultimately does not occur. Standard pointwise calibration tools (e.g. reliability diagrams) do not quantify how frequently such extremes should occur under correct sequential calibration. Under this ideal, the forecast path $p_k=\Pr(Y=1\mid F_k)$ is a bounded martingale with terminal value $p_N=Y\in\{0,1\}$. We derive benchmark distributions for extreme-path functionals conditional on the terminal outcome, emphasizing the peak-on-loss statistic $M_N=\max_{k\le N} p_k$ given $Y=0$. For continuous-time martingales with continuous sample paths, we obtain an exact identity for $\Pr(\sup_{t\in[0,1]}p_t\ge x\mid Y=0)$. In discrete time, we prove sharp finite-sample bounds and an explicit correction decomposition that isolates terminal-step crossings (non-attainment) and overshoots. These formulas provide model-agnostic null targets and one-sided tail probabilities (exact in the continuous-path setting; conservative in discrete time) for diagnosing sequential miscalibration from extreme-path behavior. We also develop competitive extensions tailored to win-probability feeds, including the eventual loser's peak win probability in two-outcome contests and the eventual winner's trough in $n$-outcome contests. An empirical illustration using ESPN win-probability series for NFL and NBA regular-season games (2018-2024) finds broad agreement with the benchmark in the NFL and systematic departures in the NBA.
翻译:针对二元结果的实时概率预测在体育赛事、在线实验、医学诊断及金融领域已成为常规实践。然而,回溯性分析往往聚焦于路径极值:例如,某个事件的预测概率曾达到$90\%$,但该事件最终并未发生。标准的逐点校准工具(如可靠性曲线图)无法量化在正确的序列校准下此类极端情况应出现的频率。在理想校准状态下,预测路径$p_k=\Pr(Y=1\mid F_k)$是一个有界鞅,其终值为$p_N=Y\in\{0,1\}$。本文推导了极端路径泛函在给定最终结果条件下的基准分布,重点关注损失条件下的峰值统计量$M_N=\max_{k\le N} p_k$(当$Y=0$时)。对于具有连续样本路径的连续时间鞅,我们得到了$\Pr(\sup_{t\in[0,1]}p_t\ge x\mid Y=0)$的精确恒等式。在离散时间情形中,我们证明了尖锐的有限样本界,并提出了显式的修正分解方法,以分离终步穿越(未达成)和超调现象。这些公式为诊断序列校准偏差提供了与模型无关的零假设目标及单侧尾部概率(在连续路径设定中精确成立,在离散时间中保守成立)。我们还针对胜率数据流开发了竞争性扩展方法,包括双结果竞赛中最终败者的峰值胜率,以及$n$结果竞赛中最终胜者的谷值胜率。通过使用ESPN提供的NFL与NBA常规赛胜率序列(2018-2024年)进行实证分析,发现NFL数据与基准模型广泛吻合,而NBA数据则呈现系统性偏离。