Reevaluating Loss Functions: Enhancing Robustness to Label Noise in Deep Learning Models

Large annotated datasets inevitably contain incorrect labels, which poses a major challenge for the training of deep neural networks as they easily fit the labels. Only when training with a robust model that is not easily distracted by the noise, a good generalization performance can be achieved. A simple yet effective way to create a noise robust model is to use a noise robust loss function. However, the number of proposed loss functions is large, they often come with hyperparameters, and may learn slower than the widely used but noise sensitive Cross Entropy loss. By heuristic considerations and extensive numerical experiments, we study in which situations the proposed loss functions are applicable and give suggestions on how to choose an appropriate loss. Additionally, we propose a novel technique to enhance learning with bounded loss functions: the inclusion of an output bias, i.e. a slight increase in the neuron pre-activation corresponding to the correct label. Surprisingly, we find that this not only significantly improves the learning of bounded losses, but also leads to the Mean Absolute Error loss outperforming the Cross Entropy loss on the Cifar-100 dataset - even in the absence of additional label noise. This suggests that training with a bounded loss function can be advantageous even in the presence of minimal label noise. To further strengthen our analysis of the learning behavior of different loss functions, we additionally design and test a novel loss function denoted as Bounded Cross Entropy.

翻译：大规模标注数据集不可避免地包含错误标签，这给深度神经网络的训练带来重大挑战，因为网络容易过度拟合这些标签。仅当使用不易受噪声干扰的鲁棒模型进行训练时，才能实现良好的泛化性能。一种简单有效的构建噪声鲁棒模型的方法是采用对噪声鲁棒的损失函数。然而，已提出的损失函数数量众多，通常带有超参数，且可能比广泛使用但对噪声敏感的交叉熵损失学习更慢。通过启发式考量与大量数值实验，我们研究了所提损失函数在何种情况下适用，并给出如何选择合适的损失函数的建议。此外，我们提出一种新颖技术来增强有界损失函数的学习：引入输出偏置，即轻微增加对应正确标签的神经元预激活值。令人惊讶的是，我们发现这不仅显著提升了有界损失的学习效果，还导致平均绝对误差损失在Cifar-100数据集上优于交叉熵损失——即使在无额外标签噪声的情况下也是如此。这表明即使在标签噪声极小的场景中，使用有界损失函数训练也可能具有优势。为进一步深化对不同损失函数学习行为的分析，我们还设计并测试了一种新颖损失函数，称为有界交叉熵。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日