Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods.
翻译:微调预训练模型已成为在新任务上训练神经网络的标准方法,可实现快速收敛并提升性能。本文研究一种替代性微调方法:我们不微调网络的所有权重,而是仅训练精心选择的子集层,其余权重冻结在初始(预训练)值。我们证明,子集微调(或称 SubTuning)通常能达到与模型完全微调相当的精度,且在训练数据稀缺时甚至超越完全微调的性能。因此,SubTuning 能以极低的计算成本部署新任务,同时享受全模型微调的优势。这为多任务学习提供了一种简单有效的方法:不同任务互不干扰,同时在推理时共享大部分资源。我们通过多种网络架构和预训练方法,在多个任务上验证了 SubTuning 的效率。