Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples, further learning erroneous associations of data contents to incorrect annotations. To this end, this paper proposes an efficient approach to tackle noisy labels by learning robust feature representation based on unsupervised augmentation restoration and cluster regularization. In addition, progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels. Our proposed design is generic and flexible in applying to existing classification architectures with minimal overheads. Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels.
翻译:深度神经网络的监督学习严重依赖于高质量标注的大规模数据集。相反,错误标注的样本会显著降低模型的泛化能力,导致模型记忆这些样本,进而学习到数据内容与错误标注之间的错误关联。为此,本文提出了一种高效方法,通过基于无监督增强恢复和聚类正则化的鲁棒特征学习来应对噪声标签。此外,引入了渐进式自引导,以最小化来自噪声标签的监督带来的负面影响。我们的设计方案具有通用性和灵活性,能以最小开销应用于现有的分类架构。实验结果表明,在噪声标签严重的情况下,所提方法能高效且有效地增强模型鲁棒性。