Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous prompts using taskspecific training data. Despite the performance improvements on downstream tasks, several studies have reported that CoOp suffers from the overfitting issue in two aspects: (i) the test accuracy on base classes first improves and then worsens during training;(ii) the test accuracy on novel classes keeps decreasing. However, none of the existing studies can understand and mitigate such overfitting problems. In this study, we first explore the cause of overfitting by analyzing the gradient flow. Comparative experiments reveal that CoOp favors generalizable and spurious features in the early and later training stages, respectively, leading to the non-overfitting and overfitting phenomena. Given those observations, we propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process and successfully eliminate the overfitting problem. In addition, we equip CoOp with a Novel Feature Learner (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set, needless of image training data. Extensive experiments on 11 classification datasets demonstrate that SubPT+NFL consistently boost the performance of CoOp and outperform the state-of-the-art CoCoOp approach. Experiments on more challenging vision downstream tasks, including open-vocabulary object detection and zero-shot semantic segmentation, also verify the effectiveness of the proposed method. Codes can be found at https://tinyurl.com/mpe64f89.

翻译：预训练的视觉-语言模型（如CLIP）在搭配恰当文本提示时，已展现出对下游视觉任务的强大泛化能力。为替代人工设计提示，近期提出的上下文优化方法（Context Optimization, CoOp）通过任务特定训练数据学习连续提示。尽管该方法提升了下游任务性能，但多项研究指出CoOp存在两方面过拟合问题：（i）基类测试准确率在训练初期提升后持续下降；（ii）新类测试准确率持续恶化。然而现有研究均未能理解并缓解此类过拟合问题。本研究首先通过分析梯度流探索过拟合成因。对比实验表明，CoOp在训练早期与后期分别倾向利用可泛化特征与虚假特征，从而引发非过拟合与过拟合现象。基于此观察，我们提出子空间提示调优（Subspace Prompt Tuning, SubPT），在完整训练过程中将反向传播梯度投影至由早期梯度流特征向量张成的低秩子空间，成功消除过拟合问题。此外，我们为CoOp配备新型特征学习器（Novel Feature Learner, NFL），在不依赖图像训练数据的情况下，增强习得提示对训练集外新类别的泛化能力。通过在11个分类数据集上的广泛实验证明，SubPT+NFL可稳定提升CoOp性能，并超越当前最优方法CoCoOp。在更具挑战性的视觉下游任务实验（包括开放词汇目标检测与零样本语义分割）中，同样验证了所提方法的有效性。代码详见 https://tinyurl.com/mpe64f89。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

专知会员服务

108+阅读 · 2023年4月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日