Network calibration aims to accurately estimate the level of confidences, which is particularly important for employing deep neural networks in real-world systems. Recent approaches leverage mixup to calibrate the network's predictions during training. However, they do not consider the problem that mixtures of labels in mixup may not accurately represent the actual distribution of augmented samples. In this paper, we present RankMixup, a novel mixup-based framework alleviating the problem of the mixture of labels for network calibration. To this end, we propose to use an ordinal ranking relationship between raw and mixup-augmented samples as an alternative supervisory signal to the label mixtures for network calibration. We hypothesize that the network should estimate a higher level of confidence for the raw samples than the augmented ones (Fig.1). To implement this idea, we introduce a mixup-based ranking loss (MRL) that encourages lower confidences for augmented samples compared to raw ones, maintaining the ranking relationship. We also propose to leverage the ranking relationship among multiple mixup-augmented samples to further improve the calibration capability. Augmented samples with larger mixing coefficients are expected to have higher confidences and vice versa (Fig.1). That is, the order of confidences should be aligned with that of mixing coefficients. To this end, we introduce a novel loss, M-NDCG, in order to reduce the number of misaligned pairs of the coefficients and confidences. Extensive experimental results on standard benchmarks for network calibration demonstrate the effectiveness of RankMixup.
翻译:网络校准旨在准确估计置信度水平,这对于在真实世界系统中部署深度神经网络尤为重要。近期方法利用混合训练(mixup)在训练过程中校准网络预测,但它们未考虑混合标签可能无法准确表示增强样本实际分布的问题。本文提出RankMixup——一种新型基于混合训练的框架,以缓解混合标签对网络校准带来的问题。为此,我们提出将原始样本与混合增强样本之间的顺序排序关系作为替代监督信号,取代标签混合用于网络校准。我们假设网络应对原始样本估计出比增强样本更高的置信度(图1)。为实现这一思想,我们引入基于混合训练的排序损失(MRL),鼓励增强样本相比原始样本具有更低的置信度,从而维持排序关系。我们还提出利用多个混合增强样本之间的排序关系进一步提升校准能力:混合系数较大的增强样本应具有更高置信度,反之亦然(图1),即置信度顺序应与混合系数顺序对齐。为此,我们引入新型损失函数M-NDCG,以减少系数与置信度之间的错配对数量。在标准网络校准基准上的大量实验结果表明了RankMixup的有效性。