Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited data. Prompt learning is emerging as a parameter-efficient method for adapting VLMs, but state-of-the-art approaches require annotated samples. In this paper we propose a novel approach to prompt learning based on unsupervised knowledge distillation from more powerful models. Our approach, which we call Knowledge Distillation Prompt Learning (KDPL), can be integrated into existing prompt learning techniques and eliminates the need for labeled examples during adaptation. Our experiments on more than ten standard benchmark datasets demonstrate that KDPL is very effective at improving generalization of learned prompts for zero-shot domain generalization, zero-shot cross-dataset generalization, and zero-shot base-to-novel class generalization problems. KDPL requires no ground-truth labels for adaptation, and moreover we show that even in the absence of any knowledge of training class names it can be used to effectively transfer knowledge. The code is publicly available at https://github.com/miccunifi/KDPL.
翻译:视觉-语言模型(VLMs)在未见任务上展现出卓越的零样本泛化能力,但在数据有限的下游任务泛化性能上仍不及监督学习方法。提示学习作为一种参数高效的VLM适配方法正在兴起,但现有先进方法均需标注样本。本文提出一种基于无监督知识蒸馏的新型提示学习方法,通过从更强大的模型中蒸馏知识实现适配。我们提出的方法称为知识蒸馏提示学习(KDPL),可集成至现有提示学习技术中,并在适配过程中完全无需标注样本。我们在十余个标准基准数据集上的实验表明,KDPL能显著提升学习提示在零样本领域泛化、零样本跨数据集泛化及零样本基类到新类泛化问题中的泛化性能。KDPL在适配过程中无需真实标签,且研究进一步表明,即使在完全未知训练类别名称的情况下,该方法仍能实现有效的知识迁移。代码已公开于https://github.com/miccunifi/KDPL。