Real-time probability forecasts for binary outcomes are routine in sports, online experimentation, medicine, and finance. Retrospective narratives, however, often hinge on pathwise extremes: for example, a forecast that becomes "90% certain" for an event that ultimately does not occur. Standard pointwise calibration tools do not quantify how frequently such extremes should arise under correct sequential calibration, where the ideal forecast sequence is a bounded martingale that ends at the realized outcome. We derive benchmark distributions for extreme-path functionals conditional on the terminal outcome, emphasizing the peak-on-loss: the largest forecast value attained along realizations that end in failure. In continuous time with continuous paths we obtain an exact closed-form benchmark; in discrete time we prove sharp finite-sample bounds together with an explicit correction decomposition that isolates terminal-step crossings and overshoots. These results yield model-agnostic null targets and one-sided tail probabilities for diagnosing sequential miscalibration from extreme-path behavior. We also develop competitive extensions tailored to win-probability feeds and illustrate the approach using ESPN win-probability series for NFL and NBA regular-season games (2018-2024), finding broad agreement with the benchmark in the NFL and systematic departures in the NBA.
翻译:针对二元结果的实时概率预测在体育赛事、在线实验、医疗诊断和金融交易中已成为常规操作。然而,回溯性分析往往聚焦于路径极值:例如,某个预测对最终未发生的事件达到"90%确信度"。标准的逐点校准工具无法量化在正确的序列校准下(此时理想预测序列是以实现结果为终点的有界鞅),此类极端情况应出现的频率。我们推导了以终端结果为条件的极端路径泛函的基准分布,重点关注损失峰值——即最终失败的情形中预测值达到的最大值。在连续时间且路径连续的设定下,我们获得了精确的闭式基准;在离散时间中,我们证明了尖锐的有限样本界,并提出了显式的修正分解,以分离终端步的穿越和超调现象。这些结果为诊断序列校准偏差的极端路径行为提供了模型无关的零假设目标和单侧尾部概率。我们还开发了针对胜率数据流的竞争性扩展方法,并利用ESPN提供的NFL和NBA常规赛(2018-2024年)胜率序列进行实证分析:结果显示NFL数据与基准广泛吻合,而NBA数据则呈现系统性偏离。