Manually annotating datasets for training deep models is very labor-intensive and time-consuming. To overcome such inferiority, directly leveraging web images to conduct training data becomes a natural choice. Nevertheless, the presence of label noise in web data usually degrades the model performance. Existing methods for combating label noise are typically designed and tested on synthetic noisy datasets. However, they tend to fail to achieve satisfying results on real-world noisy datasets. To this end, we propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets. Specifically, GRIP utilizes a group regularization strategy that estimates class soft labels to improve noise robustness. Soft label supervision reduces overfitting on noisy labels and learns inter-class similarities to benefit classification. Furthermore, an instance purification operation globally identifies noisy labels by measuring the difference between each training sample and its class soft label. Through operations at both group and instance levels, our approach integrates the advantages of noise-robust and noise-cleaning methods and remarkably alleviates the performance degradation caused by noisy labels. Comprehensive experimental results on synthetic and real-world datasets demonstrate the superiority of GRIP over the existing state-of-the-art methods.
翻译:手动为训练深度模型标注数据集极为耗时费力。为克服这一缺陷,直接利用网络图像作为训练数据成为自然之选。然而,网络数据中标签噪声的存在通常会降低模型性能。现有应对标签噪声的方法通常基于合成噪声数据集设计与测试,却难以在真实噪声数据集上取得令人满意的效果。为此,我们提出一种名为GRIP的方法,以缓解合成与真实数据集中的标签噪声问题。具体而言,GRIP采用群组正则化策略,通过估计类别软标签提升噪声鲁棒性。软标签监督既能减少过拟合噪声标签,又可学习类间相似性以优化分类。此外,实例净化操作通过度量每个训练样本与其类别软标签的差异,全局识别噪声标签。通过群组与实例两层面的协同操作,本方法融合了噪声鲁棒与噪声清洗两类方法的优势,显著缓解了噪声标签导致的性能退化。在合成与真实数据集上的全面实验结果表明,GRIP相较现有最优方法具有显著优越性。