We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input serves to further regularize the consistency constraint, thereby improving generalization. Moreover, the integration of adapters and prompts not only enhances performance on downstream tasks but also offers increased tuning flexibility in both input and output spaces. This facilitates more effective adaptation to downstream tasks in a few-shot learning setting. Experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation. On generalization, CoPrompt improves the state-of-the-art on zero-shot tasks and the overall harmonic mean over 11 datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt. We make our code available at https://github.com/ShuvenduRoy/CoPrompt.
翻译:我们提出一致性引导的提示学习(CoPrompt),这是一种面向视觉语言模型的新型微调方法。该方法旨在提升大型基础模型在少样本下游任务微调时的泛化能力。CoPrompt的基本思想是通过在可训练模型与预训练模型的预测结果之间施加一致性约束,从而防止模型对下游任务过拟合。此外,我们进一步在一致性约束中引入以下两个组件以提升性能:对两种扰动输入施加一致性约束,以及融合提示学习与适配器这两种主流的调优范式。对扰动输入施加一致性约束能进一步正则化一致性约束,进而提升泛化能力。同时,适配器与提示的集成不仅增强了下游任务的性能,还在输入与输出空间中提供了更高的调优灵活性,从而在少样本学习场景下更有效地适配下游任务。实验表明,CoPrompt在多种评估基准上(包括基类到新类泛化、域泛化及跨数据集评估)均优于现有方法。在泛化任务中,CoPrompt在零样本任务及11个数据集的整体调和均值上均达到当前最优水平。详细的消融研究验证了CoPrompt中各组件的有效性。我们已将代码开源至https://github.com/ShuvenduRoy/CoPrompt。