Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models. However, with the exponential growth of model sizes, the conventional full fine-tuning, which needs to store a individual network copy for each tasks, leads to increasingly huge storage and transmission overhead. Adapter-based Parameter-Efficient Tuning (PET) methods address this challenge by tuning lightweight adapters inserted into the frozen pre-trained models. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. Inspired by the observation that the parameters of adapters converge at flat local minima, we find that adapters are resistant to noise in parameter space, which means they are also resistant to low numerical precision. To train low-precision adapters, we propose a computational-efficient quantization method which minimizes the quantization error. Through extensive experiments, we find that low-precision adapters exhibit minimal performance degradation, and even 1-bit precision is sufficient for adapters. The experimental results demonstrate that 1-bit adapters outperform all other PET methods on both the VTAB-1K benchmark and few-shot FGVC tasks, while requiring the smallest storage size. Our findings show, for the first time, the significant potential of quantization techniques in PET, providing a general solution to enhance the parameter efficiency of adapter-based PET methods. Code: https://github.com/JieShibo/PETL-ViT
翻译:当前计算机视觉领域的最先进成果部分依赖于对大型预训练视觉模型进行微调。然而,随着模型规模的指数级增长,传统的全量微调方法需要为每个任务存储独立的网络副本,导致存储和传输开销日益庞大。基于适配器的参数高效微调(PET)方法通过在冻结的预训练模型中插入轻量级适配器来解决这一挑战。本文研究如何进一步提升适配器的效率,从而确定存储任务特定微调网络所需的最小规模。受适配器参数收敛于平坦局部最小值的现象启发,我们发现适配器对参数空间中的噪声具有鲁棒性,这意味着它们同样能够容忍低数值精度。为了训练低精度适配器,我们提出了一种计算高效的量化方法,该方法能够最小化量化误差。通过大量实验,我们发现低精度适配器的性能几乎没有下降,甚至1比特精度对适配器而言已足够。实验结果表明,在VTAB-1K基准测试和少样本FGVC任务中,1比特适配器均优于所有其他PET方法,且所需存储空间最小。我们的研究首次揭示了量化技术在PET中的巨大潜力,为提升基于适配器的PET方法的参数效率提供了通用解决方案。代码:https://github.com/JieShibo/PETL-ViT