Visual Parameter-Efficient Tuning (VPET) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing VPET methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing any of the existing structured tuning methods, e.g., LoRA or Adapter, to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing VPET methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT
翻译:视觉参数高效微调(VPET)已成为全微调的一种强大替代方案,旨在将预训练视觉模型适配到下游任务,仅调整少量参数而冻结绝大多数参数,以减轻存储负担和优化难度。然而,现有VPET方法仅依赖人工启发式方法,在不同任务的相同位置引入可训练参数,忽视了领域差异。为此,我们研究在何处引入以及如何分配可训练参数,提出了一种新颖的敏感度感知视觉参数高效微调(SPT)方案,该方案在给定可调参数预算下,自适应地将可训练参数分配到任务特定的重要位置。具体来说,我们的SPT首先以数据驱动的方式快速识别给定任务中需要微调的敏感参数。接下来,在预算约束下,SPT通过利用任何现有的结构化微调方法(如LoRA或Adapter)来增强那些敏感参数数量超过预设阈值的权重矩阵的表示能力,从而替代直接对选定的敏感参数进行非结构微调。在广泛的下游识别任务上的大量实验表明,我们的SPT与现有VPET方法互补,并大幅提升了它们的性能,例如,SPT在监督预训练ViT-B/16骨干网络上将Adapter的平均Top-1准确率分别提升了4.2%和1.4%,在FGVC和VTAB-1k基准上达到了最先进的性能。源代码位于https://github.com/ziplab/SPT。