Previous research has shown that constraining the gradient of loss function with respect to model-predicted probabilities can enhance the model robustness against noisy labels. These methods typically specify a fixed optimal threshold for gradient clipping through validation data to obtain the desired robustness against noise. However, this common practice overlooks the dynamic distribution of gradients from both clean and noisy-labeled samples at different stages of training, significantly limiting the model capability to adapt to the variable nature of gradients throughout the training process. To address this issue, we propose a simple yet effective approach called Optimized Gradient Clipping (OGC), which dynamically adjusts the clipping threshold based on the ratio of noise gradients to clean gradients after clipping, estimated by modeling the distributions of clean and noisy samples. This approach allows us to modify the clipping threshold at each training step, effectively controlling the influence of noise gradients. Additionally, we provide statistical analysis to certify the noise-tolerance ability of OGC. Our extensive experiments across various types of label noise, including symmetric, asymmetric, instance-dependent, and real-world noise, demonstrate the effectiveness of our approach.
翻译:先前的研究表明,约束损失函数相对于模型预测概率的梯度可以增强模型对噪声标签的鲁棒性。这些方法通常通过验证数据指定一个固定的最优梯度裁剪阈值,以获得所需的抗噪声鲁棒性。然而,这种常见做法忽略了训练不同阶段来自干净样本和噪声标记样本的梯度的动态分布,极大地限制了模型适应整个训练过程中梯度变化性质的能力。为了解决这个问题,我们提出了一种简单而有效的方法,称为优化梯度裁剪(OGC),该方法基于裁剪后噪声梯度与干净梯度的比例动态调整裁剪阈值,该比例通过对干净样本和噪声样本的分布进行建模来估计。这种方法允许我们在每个训练步骤中修改裁剪阈值,从而有效控制噪声梯度的影响。此外,我们提供了统计分析以证明OGC的噪声容忍能力。我们在各种类型的标签噪声(包括对称、非对称、实例相关和真实世界噪声)上进行的大量实验证明了我们方法的有效性。