As model size continues to grow and access to labeled training data remains limited, transfer learning has become a popular approach in many scientific and engineering fields. This study explores the phenomenon of neural collapse (NC) in transfer learning for classification problems, which is characterized by the last-layer features and classifiers of deep networks having zero within-class variability in features and maximally and equally separated between-class feature means. Through the lens of NC, in this work the following findings on transfer learning are discovered: (i) preventing within-class variability collapse to a certain extent during model pre-training on source data leads to better transferability, as it preserves the intrinsic structures of the input data better; (ii) obtaining features with more NC on downstream data during fine-tuning results in better test accuracy. These results provide new insight into commonly used heuristics in model pre-training, such as loss design, data augmentation, and projection heads, and lead to more efficient and principled methods for fine-tuning large pre-trained models. Compared to full model fine-tuning, our proposed fine-tuning methods achieve comparable or even better performance while reducing fine-tuning parameters by at least 70% as well as alleviating overfitting.
翻译:随着模型规模持续增长且标注训练数据仍然稀缺,迁移学习已成为众多科学与工程领域的常用方法。本研究针对分类问题中的迁移学习,探索了神经坍缩(NC)现象,其特征是深度网络最后一层的特征表示与分类器呈现类内特征零变异,同时类间特征均值达到最大化且等距分离。通过NC视角,本研究在迁移学习中发现以下结论:(i)在源数据上进行模型预训练时,适度抑制类内变异坍缩能够提升迁移能力,因其更好地保留了输入数据的固有结构;(ii)在下游数据微调过程中,获得具有更强NC属性的特征表示可带来更优的测试准确率。这些结果揭示了模型预训练中常用启发式方法(如损失函数设计、数据扩增和投影头)的新特性,并催生了对大规模预训练模型进行微调的高效规范化方法。与全模型微调相比,我们提出的微调方法在将微调参数减少至少70%的同时,能达到相当甚至更优的性能,并有效缓解过拟合问题。