While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness with large-scale ConvNets is still under-studied on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbones while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50. It can also be applied for transformer-based backbones. Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on the few-shot classification with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.
翻译:摘要:尽管参数高效微调(PET)方法在基于Transformer架构的自然语言处理(NLP)任务中展现出巨大潜力,但其在大规模卷积神经网络上的有效性在计算机视觉(CV)任务中仍缺乏充分研究。本文提出Conv-Adapter——一种专为卷积神经网络设计的参数高效微调模块。该模块具有轻量化、跨域可迁移及架构无关的特性,可在不同任务中实现泛化性能。在迁移至下游任务时,Conv-Adapter在冻结预训练参数的前提下,学习对骨干网络中间表征的任务特定特征调制。仅需引入极少量可学习参数(例如仅占ResNet50完整微调参数的3.5%),即可应用于基于Transformer的骨干网络。Conv-Adapter在23个不同领域的分类任务中超越现有参数高效微调基线方法,并达到可媲美甚至超越完整微调的性能。其在少样本分类任务中展现出优越性能,平均提升3.39%。除分类任务外,Conv-Adapter可泛化至检测与分割任务,在参数减少超50%的情况下仍保持与完整微调相当的性能。