We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models that addresses the challenge of improving the generalization capability of large foundation models while fine-tuning them on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input further regularizes the consistency constraint, effectively improving generalization, while tuning additional parameters with prompting and adapters improves the performance on downstream tasks. Extensive experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation tasks. On the generalization task, CoPrompt improves the state-of-the-art by 2.09% on the zero-shot task and 1.93% on the harmonic mean over 11 recognition datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt.
翻译:我们提出基于一致性的提示学习(CoPrompt),一种面向视觉语言模型的新微调方法,旨在解决在少样本下游任务中提升大规模基础模型泛化能力的挑战。CoPrompt的核心思想是在可训练模型与预训练模型的预测之间施加一致性约束,以防止对下游任务的过拟合。此外,我们在一致性约束中引入以下两个组件以进一步提升性能:对两个扰动输入施加一致性,并融合两种主流的参数调优范式——提示微调与适配器。对扰动输入施加一致性进一步正则化了一致性约束,有效提升泛化能力;而通过提示微调与适配器优化额外参数则提升了在下游任务上的表现。大量实验表明,CoPrompt在多种评估任务集上(包括基类到新类泛化、领域泛化及跨数据集评估)均优于现有方法。在泛化任务中,CoPrompt在零样本任务上提升了2.09%,在11个识别数据集的调和均值上提升了1.93%。详细的消融实验验证了CoPrompt中各组件的有效性。