The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy
翻译:模型预训练加后续微调的迁移学习范式可产生高精度的模型。虽然多数研究建议扩大预训练规模以最大化迁移学习的收益,但仍有问题亟待解答:预训练应使用何种数据与方法?我们通过3种预训练方法(监督学习、对比性语言-图像与图像-图像)、7个预训练数据集及9个下游数据集,系统研究了预训练数据分布对少样本与全量微调性能的影响。基于大量受控实验发现:预训练数据来源的选择对少样本迁移至关重要,但随着微调数据量的增加,其作用逐渐减弱。此外,我们探究了数据筛选的作用,并分析了预训练数据集中标签噪声与规模之间的权衡关系。结果表明,使用来自LAION的2000倍预训练数据可达到与监督式ImageNet预训练相当的性能。进一步对比语言-图像对比学习与图像-图像对比学习的预训练方法效应,发现后者能带来更高的下游准确率。