Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will eventually raise a false alarm, even when the model remains perfectly stable. Existing methods typically lack formal error guarantees, conflate alarm time with changepoint location, and monitor indirect signals that do not fully characterize calibration. We present PITMonitor, an anytime-valid calibration-specific monitor that detects distributional shifts in probability integral transforms via a mixture e-process, providing Type I error control over an unbounded monitoring horizon as well as Bayesian changepoint estimation. On river's FriedmanDrift benchmark, PITMonitor achieves detection rates competitive with the strongest baselines across all three scenarios, although detection delay is substantially longer under local drift.
翻译:实践者在监控已部署的概率模型时面临一个根本性困境:在无界数据流上重复应用任何固定样本检验,即使模型保持完全稳定,最终也会引发误报警。现有方法通常缺乏正式的错误保证,将报警时间与变点位置混为一谈,且监控的是未能完整刻画校准特性的间接信号。我们提出PITMonitor,这是一种专用于校准监控的任意时间有效监测器,通过混合e过程检测概率积分变换的分布漂移,在无界监控时间范围内提供第一类错误控制以及贝叶斯变点估计。在river库的FriedmanDrift基准测试中,PITMonitor在所有三种场景下的检测率均与最强基线方法相当,尽管在局部漂移情况下的检测延迟显著更长。