With the development of large pre-trained vision-language models, how to effectively transfer the knowledge of such foundational models to downstream tasks becomes a hot topic, especially in a data-deficient scenario. Recently, prompt tuning has become a popular solution. When adapting the vision-language models, researchers freeze the parameters in the backbone and only design and tune the prompts. On the one hand, the delicate design of prompt tuning exhibits strong performance. On the other hand, complicated structures and update rules largely increase the computation and storage cost. Motivated by the observation that the evolution pattern of the generalization capability in visual-language models aligns harmoniously with the trend of rank variations in the prompt matrix during adaptation, we design a new type of prompt, Re-parameterized Low-rank Prompt (RLP), for both efficient and effective adaptation. Our method could largely reduce the number of tunable parameters and storage space, which is quite beneficial in resource-limited scenarios. Extensive experiments further demonstrate the superiority of RLP. In particular, RLP shows comparable or even stronger performance than the latest state-of-the-art methods with an extremely small number of parameters. On a series of tasks over 11 datasets, RLP significantly increases the average downstream accuracy of classic prompt tuning by up to 5.25% using merely 0.5K parameters.
翻译:随着大型预训练视觉语言模型的发展,如何高效地将这类基础模型的知识迁移至下游任务成为热点问题,尤其在数据匮乏场景下。近期,提示调优成为广受欢迎的解决方案。在适配视觉语言模型时,研究人员冻结骨干网络参数,仅设计并调整提示。一方面,精巧的提示调优设计展现出强大性能;另一方面,复杂的结构和更新规则大幅提升了计算与存储成本。基于对视觉语言模型泛化能力演化模式与提示矩阵秩变化趋势在适配过程中高度一致的观察,我们设计了一种新型提示——重参数化低秩提示(RLP),以实现高效且有效的适配。该方法可显著减少可调参数数量及存储空间,在资源受限场景中极具优势。大量实验进一步证明了RLP的优越性。值得注意的是,RLP以极少的参数量展现出与最新最优方法相当甚至更强的性能。在覆盖11个数据集的系列任务中,RLP仅用0.5K参数便将经典提示调优的平均下游准确率最高提升5.25%。