Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods.
翻译:微调预训练模型已成为在新任务上训练神经网络的标准方法,能够带来快速收敛和性能提升。本研究探讨了一种替代性微调方法:我们并非对网络所有权重进行微调,而是仅训练精心挑选的部分层级,其余权重保持初始(预训练)值不变。实验证明,子集微调(SubTuning)通常能达到与模型完全微调相当的精度,在训练数据稀缺时甚至能超越完全微调的性能。因此,SubTuning能以最低计算成本部署新任务,同时享受完整模型微调的优势。这为多任务学习提供了一种简单有效的方法——不同任务之间互不干扰,推理时却可共享绝大部分资源。我们通过多种网络架构和预训练方法,在多任务场景中验证了SubTuning的高效性。