Collecting large-scale datasets is crucial for training deep models, annotating the data, however, inevitably yields noisy labels, which poses challenges to deep learning algorithms. Previous efforts tend to mitigate this problem via identifying and removing noisy samples or correcting their labels according to the statistical properties (e.g., loss values) among training samples. In this paper, we aim to tackle this problem from a new perspective, delving into the deep feature maps, we empirically find that models trained with clean and mislabeled samples manifest distinguishable activation feature distributions. From this observation, a novel robust training approach termed adversarial noisy masking is proposed. The idea is to regularize deep features with a label quality guided masking scheme, which adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples. Further, an auxiliary task is designed to reconstruct input data, it naturally provides noise-free self-supervised signals to reinforce the generalization ability of deep models. The proposed method is simple and flexible, it is tested on both synthetic and real-world noisy datasets, where significant improvements are achieved over previous state-of-the-art methods.
翻译:收集大规模数据集对于训练深度模型至关重要,然而数据标注过程不可避免会产生噪声标签,这对深度学习算法构成了挑战。以往的研究通常通过根据训练样本的统计特性(如损失值)识别并剔除噪声样本或修正其标签来缓解该问题。本文旨在从新视角探索该问题:深入分析深度特征图后,我们通过实验发现,使用干净样本与错误标注样本训练的模型会呈现出可区分的激活特征分布。基于这一发现,本文提出了一种名为"对抗噪声掩码"的新型鲁棒训练方法。该方法的核心思想是采用标签质量引导的掩码策略来约束深度特征,通过自适应地同时调节输入数据与标签,防止模型过拟合噪声样本。此外,我们设计了辅助重建输入数据的任务,该任务能自然提供无噪声自监督信号,从而增强深度模型的泛化能力。所提方法简洁灵活,在合成与真实噪声数据集上的测试表明,其性能显著超过了现有最优方法。