Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks. With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer. Instead, recent advances can achieve superior performance than full-tuning the whole pre-trained parameters by updating far fewer parameters, enabling edge devices and downstream applications to reuse the increasingly large foundation models deployed on the cloud. With the aim of helping researchers get the full picture and future directions of visual tuning, this survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of existing work and models. Specifically, it provides a detailed background of visual tuning and categorizes recent visual tuning techniques into five groups: prompt tuning, adapter tuning, parameter tuning, and remapping tuning. Meanwhile, it offers some exciting research directions for prospective pre-training and various interactions in visual tuning.
翻译:在众多下游视觉任务中,微调视觉模型已被广泛证明具有显著性能优势。随着预训练视觉基础模型的惊人发展,视觉调优突破了标准操作模式——即微调整个预训练模型或仅全连接层。相反,最新进展通过更新更少的参数即可实现优于全参数微调的性能,使得边缘设备和下游应用能够复用云端部署的日益庞大的基础模型。为帮助研究者全面把握视觉调优的研究现状与未来方向,本综述系统性地甄选了大量具有代表性的最新工作,对现有研究与模型进行了全面梳理。具体而言,本文详细阐述了视觉调优的背景知识,并将现有视觉调优技术划分为五大类:提示调优、适配器调优、参数调优与重映射调优。同时,本文为预训练及视觉调优中各类交互的未来研究提供了若干富有前景的方向。