Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods.
翻译:对预训练模型进行微调已成为在新任务上训练神经网络的标准方法,能够实现快速收敛并提升性能。本研究探索了一种替代性微调方法:不同于对整个网络的所有权重进行微调,我们仅训练精心选择的子集层级,而其余权重保持初始(预训练)参数的冻结状态。实验表明,子集微调(SubTuning)通常能达到与全模型微调相当的精度,且在训练数据稀缺时甚至超越全微调的性能。因此,SubTuning能够以极低的计算成本部署新任务,同时享受全模型微调的优势。这为多任务学习提供了一种简洁有效的方法——不同任务之间互不干扰,且在推理时共享大部分资源。我们通过多种网络架构和预训练方法,在多个任务上验证了SubTuning的高效性。