Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly reduced the computation and storage cost by inserting lightweight prompt modules into the pre-trained models and tuning these prompt modules with a small number of trainable parameters, while keeping the transformer backbone frozen. Although only a few parameters need to be adjusted, most PETuning methods still require a significant amount of downstream task training data to achieve good results. The performance is inadequate on low-data regimes, especially when there are only one or two examples per class. To this end, we first empirically identify the poor performance is mainly due to the inappropriate way of initializing prompt modules, which has also been verified in the pre-trained language models. Next, we propose a Pre-trained Visual Parameter-efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules along with the pre-trained transformer backbone to perform parameter-efficient tuning on downstream tasks. Experiment results on five Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that our proposed method significantly outperforms state-of-the-art PETuning methods.
翻译:摘要:大规模预训练Transformer已在各类计算机视觉任务中取得显著成功。然而,由于高昂的计算和存储成本,完全微调这些模型以适配下游任务仍极具挑战性。近年来,参数高效调优(PETuning)技术(例如视觉提示调优(VPT)和低秩适配(LoRA))通过在预训练模型中插入轻量级提示模块,并在保持Transformer主干冻结的情况下仅调整少量可训练参数,显著降低了计算和存储成本。尽管只需调整极少数参数,但多数PETuning方法仍需大量下游任务训练数据才能获得良好效果。在低数据场景下(尤其当每类仅有一到两个样本时),其性能表现严重不足。为此,我们首先通过实验验证,性能不佳的主要原因是提示模块初始化方式不当——这一结论在预训练语言模型中亦得到验证。随后,我们提出预训练视觉参数高效(PVP)调优框架,该框架先预训练参数高效调优模块,再将其与预训练Transformer主干结合,对下游任务进行参数高效调优。在五个细粒度视觉分类(FGVC)数据集和VTAB-1k数据集上的实验结果表明,我们提出的方法显著优于目前最先进的PETuning方法。