Reliable probabilities are critical in high-risk applications, yet common calibration criteria (confidence, class-wise) are only necessary for full distributional calibration, and post-hoc methods often lack distribution-free guarantees. We propose a set-based notion of calibration, cumulative mass calibration, and a corresponding empirical error measure: the Cumulative Mass Calibration Error (CMCE). We develop a new calibration procedure that starts with conformal prediction to obtain a set of labels that gives the desired coverage. We then instantiate two simple post-hoc calibrators: a mass normalization and a temperature scaling-based rule, tuned to the conformal constraint. On multi-class image benchmarks, especially with a large number of classes, our methods consistently improve CMCE and standard metrics (ECE, cw-ECE, MCE) over baselines, delivering a practical, scalable framework with theoretical guarantees.
翻译:可靠的概率估计在高风险应用中至关重要,然而常见的校准标准(置信度校准、类间校准)仅是实现完全分布校准的必要条件,且事后校准方法通常缺乏无分布保证。我们提出一种基于集合的校准概念——累积质量校准,以及相应的经验误差度量:累积质量校准误差(CMCE)。我们开发了一种新的校准流程,该流程首先通过共形预测获得满足所需覆盖率的标签集合。随后,我们实例化了两种简单的事后校准器:基于质量归一化的校准规则和基于温度缩放机制的校准规则,二者均依据共形约束进行调优。在多类别图像基准测试中,尤其是在类别数量庞大的场景下,我们的方法相较于基线模型,在CMCE及标准度量指标(ECE、cw-ECE、MCE)上均取得持续改进,提供了一个兼具理论保证的实用、可扩展框架。