Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios.
翻译:现代视觉识别模型因其对复杂深度神经网络和独热目标监督的依赖,常表现出过度自信,导致置信度分数不可靠,需要进行校准。当前置信度校准技术主要针对单标签场景,缺乏对更实用且可泛化的多标签情境的关注。本文提出多标签置信度校准(MLCC)任务,旨在为多标签场景提供校准良好的置信度分数。与单标签图像不同,多标签图像包含多个对象,导致语义混淆,进一步加剧了置信度分数的不可靠性。现有的基于标签平滑的单标签校准方法未能考虑类别关联性,而这对解决语义混淆至关重要,因此导致次优性能。为克服这些限制,我们提出动态关联学习与正则化(DCLR)算法,该算法利用多粒度语义关联性来更好地建模语义混淆,实现自适应正则化。DCLR学习针对每个类别的动态实例级和原型级相似性,并利用这些相似性度量不同类别间的语义关联性。基于此理解,我们构建自适应标签向量,为具有强关联性的类别分配更高值,从而实现更有效的正则化。我们建立了一个评估基准,重新实现了多种先进的置信度校准算法,并将其应用于领先的多标签识别(MLR)模型以进行公平比较。通过大量实验,我们证明了DCLR在多标签场景中提供可靠置信度分数方面优于现有方法的卓越性能。