ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning $n$ tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our ScaLearn, in addition to facilitating the benefits of two-stage MTL, consistently outperforms strong baselines with only a small number of transfer parameters - roughly 0.35% of those of AdapterFusion. Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters through uniform scaling and layer-sharing, achieving similarly competitive results with only $8$ transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.

翻译：多任务学习（MTL）已展现出显著的实用优势，尤其是在使用预训练语言模型（PLM）时。虽然通常通过联合优化过程同时学习n个任务来实现这一点，但AdapterFusion等方法将问题分为两个不同阶段：（i）任务学习，即特定任务的知识被封装在参数集（如适配器）中；（ii）迁移，即利用这些已学习的知识来处理目标任务。这种关注点分离带来了诸多好处，例如促进可重用性，以及处理涉及数据隐私和社会关切的问题；然而，当前的两阶段MTL方法会引入大量额外参数。在本文中，我们通过利用线性缩放源适配器输出表示对迁移学习的有效性来解决此问题。我们提出ScaLearn，一种简单且高度参数高效的两阶段MTL方法，它通过学习一组最小的缩放参数来利用源任务的知识，从而实现对目标任务的有效知识迁移。我们在三个基准（GLUE、SuperGLUE和HumSet）上的实验表明，我们的ScaLearn除了促进两阶段MTL的优势外，还仅使用少量迁移参数（约为AdapterFusion的0.35%）便持续优于强基线。值得注意的是，我们观察到ScaLearn即使通过统一缩放和层共享进一步减少参数，也保持其强大能力，对每个目标任务仅使用8个迁移参数即可取得同样有竞争力的结果。因此，我们提出的方法展示了简单缩放作为更高效任务迁移途径的强大潜力。