Large annotated datasets inevitably contain incorrect labels, which poses a major challenge for the training of deep neural networks as they easily fit the labels. Only when training with a robust model that is not easily distracted by the noise, a good generalization performance can be achieved. A simple yet effective way to create a noise robust model is to use a noise robust loss function. However, the number of proposed loss functions is large, they often come with hyperparameters, and may learn slower than the widely used but noise sensitive Cross Entropy loss. By heuristic considerations and extensive numerical experiments, we study in which situations the proposed loss functions are applicable and give suggestions on how to choose an appropriate loss. Additionally, we propose a novel technique to enhance learning with bounded loss functions: the inclusion of an output bias, i.e. a slight increase in the neuron pre-activation corresponding to the correct label. Surprisingly, we find that this not only significantly improves the learning of bounded losses, but also leads to the Mean Absolute Error loss outperforming the Cross Entropy loss on the Cifar-100 dataset - even in the absence of additional label noise. This suggests that training with a bounded loss function can be advantageous even in the presence of minimal label noise. To further strengthen our analysis of the learning behavior of different loss functions, we additionally design and test a novel loss function denoted as Bounded Cross Entropy.
翻译:大规模标注数据集不可避免地包含错误标签,这给深度神经网络的训练带来重大挑战,因为网络容易过度拟合这些标签。仅当使用不易受噪声干扰的鲁棒模型进行训练时,才能实现良好的泛化性能。一种简单有效的构建噪声鲁棒模型的方法是采用对噪声鲁棒的损失函数。然而,已提出的损失函数数量众多,通常带有超参数,且可能比广泛使用但对噪声敏感的交叉熵损失学习更慢。通过启发式考量与大量数值实验,我们研究了所提损失函数在何种情况下适用,并给出如何选择合适的损失函数的建议。此外,我们提出一种新颖技术来增强有界损失函数的学习:引入输出偏置,即轻微增加对应正确标签的神经元预激活值。令人惊讶的是,我们发现这不仅显著提升了有界损失的学习效果,还导致平均绝对误差损失在Cifar-100数据集上优于交叉熵损失——即使在无额外标签噪声的情况下也是如此。这表明即使在标签噪声极小的场景中,使用有界损失函数训练也可能具有优势。为进一步深化对不同损失函数学习行为的分析,我们还设计并测试了一种新颖损失函数,称为有界交叉熵。