Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt or residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely Soft Prompt Generation (SPG). Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt label for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that SPG achieves state-of-the-art performance. The code is available at https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN.
翻译:大型预训练视觉语言模型(VLM)在使用人工设计的提示时,在下游任务上展现出令人印象深刻的零样本能力。为了进一步使VLM适应下游任务,研究者提出了软提示以替代人工设计的提示,并基于特定领域数据进行微调。现有的提示学习方法主要从训练样本中学习固定的提示或残差提示。然而,学习到的提示缺乏多样性,并且忽略了未见领域的信息。在本文中,我们从生成的角度重构了提示学习框架,并提出了一种简单而高效的方法用于领域泛化(DG)任务,即软提示生成(SPG)。具体而言,SPG包含两阶段训练阶段和一个推理阶段。在训练阶段,我们为每个领域引入软提示标签,旨在融入生成模型的领域知识。在推理阶段,利用生成模型的生成器来获取未见目标领域的实例特定软提示。在三个DG任务的五个领域泛化基准测试上进行的大量实验表明,SPG实现了最先进的性能。代码可在 https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN 获取。