Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity

We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unified framework to explain the classical cross-entropy (CE) loss and SVM loss and their variants, (ii) it includes a special family corresponding to the temperature-scaled CE loss, which is widely adopted but poorly understood; (iii) it allows us to achieve adaptivity to the uncertainty degree of label information at an instance level. Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. The code is open-sourced at \url{https://github.com/Optimization-AI/ICML2023_LDR}.

翻译：我们研究了一类名为标签分布鲁棒（LDR）的损失函数家族，用于多类分类。这类损失函数基于分布鲁棒优化（DRO）视角构建，通过考虑分布权重的最坏情况来建模和捕捉给定标签信息中的不确定性。这一视角的优点包括：（i）它提供了一个统一框架，用于解释经典的交叉熵（CE）损失、SVM损失及其变体；（ii）它包含一个对应于温度缩放交叉熵（temperature-scaled CE）损失的特殊家族，该损失被广泛使用但理解不足；（iii）它使我们能够在实例级别实现标签信息不确定性程度的自适应性。我们的贡献包括：（1）我们通过建立LDR损失在多类分类中的top-$k$（$\forall k\geq 1$）一致性，以及一个负面结果——即一个top-$1$一致且对称鲁棒的损失函数无法同时对所有$k\geq 2$实现top-$k$一致性——来研究一致性和鲁棒性；（2）我们提出了一种新的自适应LDR损失，该损失自动将个体化温度参数调整到每个实例的类别标签噪声程度；（3）我们在7个基准数据集上（包括6个噪声标签设置和1个干净标签设置）与13种损失函数进行了比较，并在一个真实噪声数据集上验证了所提出的自适应LDR损失的稳定且竞争性的性能。代码已在\url{https://github.com/Optimization-AI/ICML2023_LDR}开源。