Visual prompt learning, as a newly emerged technique, leverages the knowledge learned by a large-scale pre-trained model and adapts it to downstream tasks through the usage of prompts. While previous research has focused on designing effective prompts, in this work, we argue that compared to prompt design, a good mapping strategy matters more. In this sense, we propose SeMap, a more effective mapping using the semantic alignment between the pre-trained model's knowledge and the downstream task. Our experimental results show that SeMap can largely boost the performance of visual prompt learning. Moreover, our experiments show that SeMap is capable of achieving competitive zero-shot transfer, indicating that it can perform the downstream task without any fine-tuning on the corresponding dataset. This demonstrates the potential of our proposed method to be used in a broader range of applications where the zero-shot transfer is desired. Results suggest that our proposed SeMap could lead to significant advancements in both visual prompt learning and zero-shot transfer. We hope with SeMap, we can help the community move forward to more efficient and lightweight utilization of large vision models.
翻译:视觉提示学习作为一种新兴技术,借助大规模预训练模型学到的知识,通过使用提示将其适配到下游任务中。尽管以往的研究致力于设计有效的提示,但在这项工作中,我们认为相比提示设计,良好的映射策略更为重要。基于这一认识,我们提出SeMap——一种利用预训练模型知识与下游任务之间语义对齐的更高效的映射方法。实验结果表明,SeMap能够大幅提升视觉提示学习的性能。此外,我们的实验显示,SeMap能够实现具有竞争力的零样本迁移,表明它无需对相应数据集进行任何微调即可执行下游任务。这展示了我们提出的方法在需要零样本迁移的更广泛应用场景中的潜力。结果表明,我们的SeMap方案可能在视觉提示学习和零样本迁移领域带来显著进展。我们期望借助SeMap,推动社区向更高效、更轻量地利用大型视觉模型的方向前进。