Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6% parameters, we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at https://github.com/luogen1996/RepAdapter.
翻译:参数高效迁移学习(PETL)是一个新兴的研究热点,旨在以低成本将大规模预训练模型适配至下游任务。近年来的进展通过仅更新或注入少量参数(而非全参数微调),在多种视觉任务中显著节省了存储成本。然而,我们发现现有大多数PETL方法在推理阶段仍会产生不可忽视的延迟。本文提出一种面向巨型视觉模型的参数高效且计算友好的适配器——RepAdapter。具体而言,我们证明了即使具有复杂结构的适配模块,也可以通过结构重参数化无缝集成到大多数巨型视觉模型中。这一特性使得RepAdapter在推理阶段实现零额外计算成本。除计算效率外,RepAdapter凭借其稀疏结构及精心部署,比现有PETL方法更具高效性与轻量化优势。为验证RepAdapter,我们在涵盖图像分类、视频分类与语义分割三大视觉任务的27个基准数据集上进行了广泛实验。实验结果表明,RepAdapter在性能与效率上均优于当前最优的PETL方法。例如,仅更新0.6%的参数,即可将ViT在Sun397数据集上的性能从38.8提升至55.1。其泛化性也在ViT、CLIP、Swin-Transformer与ConvNeXt等多种视觉模型上得到充分验证。我们的源代码已发布在https://github.com/luogen1996/RepAdapter。