Parameter Competition Balancing for Model Merging

While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods. The code is publicly available at: \url{https://github.com/duguodong7/pcb-merging}.

翻译：尽管微调预训练模型已成为常规实践，但这些模型在其特定领域之外往往表现不佳。近期发展的模型融合技术使得能够将多个针对不同任务微调的模型直接集成到单一模型中。该策略无需在原始数据集上重新训练即可提升多任务处理能力。然而，现有方法在解决任务间潜在冲突和复杂相关性方面存在不足，特别是在参数级调整上，这给有效平衡不同任务间的参数竞争带来了挑战。本文提出一种名为PCB-Merging（参数竞争平衡）的创新技术，这是一种轻量级且无需训练的方法，通过调整各参数的系数来实现有效的模型融合。PCB-Merging采用内部平衡来评估参数在单个任务内的重要性，并通过跨任务平衡来评估参数在不同任务间的相似性。重要性评分较低的参数将被剔除，剩余参数经重新缩放后形成最终融合模型。我们在多种融合场景中评估了该方法，包括跨任务、跨领域和跨训练配置，以及领域外泛化能力。实验结果表明，我们的方法在多种模态、领域、模型规模、任务数量、微调形式及大语言模型上均实现了显著的性能提升，优于现有模型融合方法。代码公开于：\url{https://github.com/duguodong7/pcb-merging}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日