Foundation models, including Vision Language Models (VLMs) and Large Language Models (LLMs), possess the $generality$ to handle diverse distributions and tasks, which stems from their extensive pre-training datasets. The fine-tuning of foundation models is a common practice to enhance task performance or align the model's behavior with human expectations, allowing them to gain $speciality$. However, the small datasets used for fine-tuning may not adequately cover the diverse distributions and tasks encountered during pre-training. Consequently, the pursuit of speciality during fine-tuning can lead to a loss of {generality} in the model, which is related to catastrophic forgetting (CF) in deep learning. In this study, we demonstrate this phenomenon in both VLMs and LLMs. For instance, fine-tuning VLMs like CLIP on ImageNet results in a loss of generality in handling diverse distributions, and fine-tuning LLMs like Galactica in the medical domain leads to a loss in following instructions and common sense. To address the trade-off between the speciality and generality, we investigate multiple regularization methods from continual learning, the weight averaging method (Wise-FT) from out-of-distributional (OOD) generalization, which interpolates parameters between pre-trained and fine-tuned models, and parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA). Our findings show that both continual learning and Wise-ft methods effectively mitigate the loss of generality, with Wise-FT exhibiting the strongest performance in balancing speciality and generality.
翻译:基础模型,包括视觉语言模型(VLM)和大语言模型(LLM),因其广泛的预训练数据集而具备处理多样分布和任务的通用性。对基础模型进行微调是提升任务性能或使模型行为符合人类期望的常见实践,使其获得专业性。然而,用于微调的小规模数据集可能无法充分覆盖预训练期间遇到的多样分布和任务。因此,微调过程中对专业性的追求可能导致模型丧失通用性,这与深度学习中的灾难性遗忘相关。在本研究中,我们在VLM和LLM中均证明了这一现象。例如,在ImageNet上微调CLIP等VLM会导致处理多样分布时丧失通用性,而在医学领域微调Galactica等LLM则会导致遵循指令和常识能力的丧失。为解决专业性与通用性之间的权衡问题,我们研究了多种方法:来自持续学习的正则化方法、来自分布外泛化的权重平均方法(Wise-FT,通过插值预训练模型与微调模型的参数实现),以及参数高效微调方法(如低秩适配)。研究结果表明,持续学习方法和Wise-FT方法均能有效缓解通用性丧失,其中Wise-FT在平衡专业性与通用性方面表现最佳。