Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited data. Prompt learning is emerging as a parameter-efficient method for adapting VLMs, but state-of-the-art approaches require annotated samples. In this paper we propose a novel approach to prompt learning based on unsupervised knowledge distillation from more powerful models. Our approach, which we call Knowledge Distillation Prompt Learning (KDPL), can be integrated into existing prompt learning techniques and eliminates the need for labeled examples during adaptation. Our experiments on more than ten standard benchmark datasets demonstrate that KDPL is very effective at improving generalization of learned prompts for zero-shot domain generalization, zero-shot cross-dataset generalization, and zero-shot base-to-novel class generalization problems. KDPL requires no ground-truth labels for adaptation, and moreover we show that even in the absence of any knowledge of training class names it can be used to effectively transfer knowledge. The code is publicly available at https://github.com/miccunifi/KDPL.
翻译:视觉-语言模型(VLMs)在未见任务上展现出卓越的零样本泛化能力,但在数据有限的下游任务泛化性能上仍不及监督方法。提示学习作为一种参数高效的VLM适配方法正逐渐兴起,但现有先进方法仍需标注样本。本文提出一种基于从更强大模型进行无监督知识蒸馏的新型提示学习方法。我们提出的方法称为知识蒸馏提示学习(KDPL),可集成至现有提示学习技术中,并在适配过程中无需标注样本。我们在十余个标准基准数据集上的实验表明,KDPL能有效提升学习提示在零样本领域泛化、零样本跨数据集泛化及零样本基类到新类泛化问题中的泛化能力。KDPL在适配过程中无需真实标签,并且我们进一步证明即使在没有训练类别名称信息的情况下,该方法仍能有效实现知识迁移。代码公开于 https://github.com/miccunifi/KDPL。