With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.
翻译:随着大规模预训练模型日益复杂以及下游任务标注数据匮乏,迁移学习已成为自然语言处理、计算机视觉与多模态学习等领域的核心方法。尽管近期研究取得进展,视觉领域大规模预训练模型的微调过程仍主要依赖经验性尝试。本文系统探究了神经坍缩(NC)现象与分类任务迁移学习之间的关联。神经坍缩是近期在训练完成的神经网络末层特征与线性分类器中发现的普遍而有趣的现象:在训练终末阶段,同类特征方差趋于零,而类间特征均值呈现最大等距分布。本研究通过考察预训练模型在迁移学习中源域与目标域数据上的神经坍缩特性,发现特征坍缩程度与下游性能存在显著相关性。特别地,我们在对预训练模型进行下游训练数据线性探测时发现系统性规律:预训练模型在下游训练数据上的特征坍缩程度越高,迁移准确率越优。同时,我们也探究了源域数据上神经坍缩与迁移准确率的关联。基于这些发现,我们提出一种具有理论依据的参数高效微调方法,通过引入跳跃连接促进下游数据末层特征坍缩。所提出的微调方法在减少至少90%微调参数的同时保持优异性能,尤其在下游数据稀缺场景下能有效缓解过拟合问题。