AdaMerging: Adaptive Model Merging for Multi-Task Learning

Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.

翻译：多任务学习（MTL）旨在使单一模型能够同时处理多个任务。最新进展"任务算术"揭示，针对不同任务分别微调的多个模型可直接合并为单一模型，从而在无需使用初始训练数据进行重训练的情况下执行MTL。然而，这种模型的直接相加通常会导致合并后模型整体性能显著下降，其原因是多个任务间存在潜在冲突与复杂关联。由此引出的挑战是：如何在不使用原始训练数据的情况下更有效地合并预训练模型？本文提出了一种名为自适应模型合并（AdaMerging）的创新技术。该方法旨在以任务级或层级方式自动学习模型合并系数，且无需依赖原始训练数据。具体而言，AdaMerging作为一种自动化的无监督任务算术方案，利用多任务设置中未标注测试样本的熵最小化作为替代目标函数，迭代优化多个模型的合并系数。我们在八个任务上的实验结果表明，所提出的AdaMerging方案具有显著有效性。与当前最先进的任务算术合并方案相比，AdaMerging性能提升了11%。值得注意的是，AdaMerging在面对未见过的下游任务时展现出更强的泛化能力，并且在测试阶段可能发生的数据分布偏移下表现出显著增强的鲁棒性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日