With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompts. On the one hand, the delicate design of prompt tuning exhibits strong performance. On the other hand, complicated structures and update rules largely increase the computation and storage cost. Motivated by the observation that the evolution pattern of the generalization capability in visual-language models aligns harmoniously with the trend of rank variations in the prompt matrix during adaptation, we design a new type of prompt, Re-parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation. Our method could largely reduce the number of tunable parameters and storage space, which is quite beneficial in resource-limited scenarios. Extensive experiments further demonstrate the superiority of RLP. In particular, RLP shows comparable or even stronger performance than the latest state-of-the-art methods with an extremely small number of parameters. On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.
翻译:随着大规模预训练视觉语言模型的发展,如何有效将这些基础模型的知识迁移至下游任务成为热点问题,尤其在数据匮乏场景下。近期,提示调优(prompt tuning)成为一种主流解决方案。在适配视觉语言模型时,研究者冻结骨干网络参数,仅设计并调优提示。一方面,精巧的提示调优设计展现出强劲性能;另一方面,复杂的结构及更新规则大幅增加了计算与存储成本。受视觉语言模型泛化能力演变模式与适配过程中提示矩阵秩变化趋势高度一致的观察启发,我们设计了一种新型提示——重参数化低秩提示(Re-parameterized Low-rank Prompt, RLP),以实现高效且有效的适配。该方法可大幅减少可调参数量与存储空间,在资源受限场景下极具优势。大量实验进一步证明了RLP的优越性。尤其值得注意的是,RLP以极少的参数取得了与最新最先进方法相当甚至更强的性能。在涵盖11个数据集的多项任务中,RLP仅凭0.5K参数便将经典提示调优的平均下游准确率提升高达5.25%。