Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.

翻译：随着大规模预训练模型日益复杂以及下游任务标注数据匮乏，迁移学习已成为自然语言处理、计算机视觉与多模态学习等领域的核心方法。尽管近期研究取得进展，视觉领域大规模预训练模型的微调过程仍主要依赖经验性尝试。本文系统探究了神经坍缩（NC）现象与分类任务迁移学习之间的关联。神经坍缩是近期在训练完成的神经网络末层特征与线性分类器中发现的普遍而有趣的现象：在训练终末阶段，同类特征方差趋于零，而类间特征均值呈现最大等距分布。本研究通过考察预训练模型在迁移学习中源域与目标域数据上的神经坍缩特性，发现特征坍缩程度与下游性能存在显著相关性。特别地，我们在对预训练模型进行下游训练数据线性探测时发现系统性规律：预训练模型在下游训练数据上的特征坍缩程度越高，迁移准确率越优。同时，我们也探究了源域数据上神经坍缩与迁移准确率的关联。基于这些发现，我们提出一种具有理论依据的参数高效微调方法，通过引入跳跃连接促进下游数据末层特征坍缩。所提出的微调方法在减少至少90%微调参数的同时保持优异性能，尤其在下游数据稀缺场景下能有效缓解过拟合问题。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日