Collecting large-scale datasets is crucial for training deep models, annotating the data, however, inevitably yields noisy labels, which poses challenges to deep learning algorithms. Previous efforts tend to mitigate this problem via identifying and removing noisy samples or correcting their labels according to the statistical properties (e.g., loss values) among training samples. In this paper, we aim to tackle this problem from a new perspective, delving into the deep feature maps, we empirically find that models trained with clean and mislabeled samples manifest distinguishable activation feature distributions. From this observation, a novel robust training approach termed adversarial noisy masking is proposed. The idea is to regularize deep features with a label quality guided masking scheme, which adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples. Further, an auxiliary task is designed to reconstruct input data, it naturally provides noise-free self-supervised signals to reinforce the generalization ability of deep models. The proposed method is simple and flexible, it is tested on both synthetic and real-world noisy datasets, where significant improvements are achieved over previous state-of-the-art methods.
翻译:收集大规模数据集对训练深度模型至关重要,然而数据标注不可避免地会引入噪声标签,这对深度学习算法构成了挑战。以往的研究倾向于通过根据训练样本的统计特性(如损失值)识别并移除噪声样本或修正其标签来缓解这一问题。本文旨在从一个新的视角解决该问题:通过深入分析深层特征图,我们实证发现,使用干净样本和错误标记样本训练的模型会呈现可区分的激活特征分布。基于这一观察,我们提出了一种名为对抗噪声掩码的新型鲁棒训练方法。其核心思想是通过一种标签质量引导的掩码机制对深度特征进行正则化,该机制能够自适应地同时调制输入数据和标签,从而防止模型对噪声样本过拟合。此外,我们还设计了一项辅助任务用于重构输入数据,该任务自然提供了无噪声的自监督信号,用以增强深度模型的泛化能力。所提方法简洁而灵活,在合成和真实噪声数据集上均进行了测试,相较于先前的最优方法取得了显著改进。