MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning

Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning. Consequently, parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters. While most of these methods are designed for single-task adaptation, parameter-efficient training in Multi-Task Learning (MTL) architectures is still unexplored. In this paper, we introduce MTLoRA, a novel framework for parameter-efficient training of MTL models. MTLoRA employs Task-Agnostic and Task-Specific Low-Rank Adaptation modules, which effectively disentangle the parameter space in MTL fine-tuning, thereby enabling the model to adeptly handle both task specialization and interaction within MTL contexts. We applied MTLoRA to hierarchical-transformer-based MTL architectures, adapting them to multiple downstream dense prediction tasks. Our extensive experiments on the PASCAL dataset show that MTLoRA achieves higher accuracy on downstream tasks compared to fully fine-tuning the MTL model while reducing the number of trainable parameters by 3.6x. Furthermore, MTLoRA establishes a Pareto-optimal trade-off between the number of trainable parameters and the accuracy of the downstream tasks, outperforming current state-of-the-art parameter-efficient training methods in both accuracy and efficiency. Our code is publicly available.

翻译：在大规模数据集上预训练的模型适配多种下游任务是深度学习中的常见策略。因此，参数高效微调方法成为适配预训练模型至不同任务的有前景途径，仅需训练极少量参数。尽管多数方法针对单任务适配设计，但多任务学习（MTL）架构中的参数高效训练仍待探索。本文提出MTLoRA——一种面向MTL模型参数高效训练的新型框架。MTLoRA采用任务无关与任务特定低秩适配模块，有效解耦MTL微调中的参数空间，使模型能够灵活处理MTL场景中的任务专化与交互。我们将MTLoRA应用于基于层级Transformer的MTL架构，使其适配多种密集预测下游任务。在PASCAL数据集上的大量实验表明，相较完整微调MTL模型，MTLoRA在降低3.6倍可训练参数量的同时，在下游任务上取得更高精度。此外，MTLoRA在可训练参数量与下游任务精度之间建立了帕累托最优权衡，在精度与效率上均超越当前最先进的参数高效训练方法。我们的代码已公开。

相关内容

多任务学习

关注 162

多任务学习（MTL）是机器学习的一个子领域，可以同时解决多个学习任务，同时利用各个任务之间的共性和差异。与单独训练模型相比，这可以提高特定任务模型的学习效率和预测准确性。多任务学习是归纳传递的一种方法，它通过将相关任务的训练信号中包含的域信息用作归纳偏差来提高泛化能力。通过使用共享表示形式并行学习任务来实现,每个任务所学的知识可以帮助更好地学习其它任务。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日