Pre-training on large-scale datasets and then fine-tuning on downstream tasks have become a standard practice in deep learning. However, pre-training data often contain label noise that may adversely affect the generalization of the model. This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. More specifically, through extensive experiments of supervised pre-training models on synthetic noisy ImageNet-1K and YFCC15M datasets, we demonstrate that while slight noise in pre-training can benefit in-domain (ID) transfer performance, where the training and testing data share the same distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing data distribution are different. We empirically verify that the reason behind is noise in pre-training shapes the feature space differently. We then propose a lightweight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization on both ID and OOD tasks, considering one may not be able to fully fine-tune or even access the pre-trained models. We conduct practical experiments on popular vision and language models that are pre-trained on noisy data for evaluation of our approach. Our analysis and results show the importance of this interesting and novel research direction, which we term Noisy Model Learning.
翻译:在大规模数据集上预训练,随后在下游任务上进行微调,已成为深度学习中的标准做法。然而,预训练数据常包含标签噪声,这可能对模型的泛化能力产生不利影响。本文旨在理解预训练数据集中噪声的本质,并缓解其对下游任务的影响。具体而言,通过在合成含噪ImageNet-1K和YFCC15M数据集上进行有监督预训练模型的广泛实验,我们证明:预训练中的轻微噪声能提升域内(ID)迁移性能(即训练与测试数据分布相同时),但始终会降低域外(OOD)性能(即训练与测试数据分布不同时)。我们通过实验验证,其根本原因在于预训练中的噪声以不同方式塑造了特征空间。随后,我们提出一种轻量级黑盒调优方法(NMTune),对特征空间进行仿射变换,以缓解噪声的有害影响,并提升ID与OOD任务上的泛化能力。该方法考虑实际场景中可能无法完全微调甚至访问预训练模型的情况。我们在基于噪声数据预训练的流行视觉与语言模型上开展实用实验,以评估所提方法。我们的分析与结果表明了“噪声模型学习”这一新颖而有趣的研究方向的重要性。