Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while keeping the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings.
翻译:处理训练数据分布偏移(即分布外泛化问题)是机器学习领域的一项重大挑战。尽管CLIP等预训练视觉语言模型展现出卓越的零样本性能,但将模型进一步适配至下游任务时,会导致其在分布外数据上的性能出现显著退化。本研究提出稀疏自适应微调方法(SAFT),该方法通过仅更新梯度幅值较大的重要参数子集(占比0.1%),同时冻结其余参数,有效防止微调过程遗忘预训练模型中的通用知识。SAFT实现简洁且概念清晰。大量实验表明,该方法能显著提升CLIP模型的性能,在多个基准测试中持续优于基线方法。在ImageNet及其变体的少样本学习基准上,SAFT在分布外场景中相较传统微调方法平均获得5.15%的性能增益。