While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods. The code is publicly available at: \url{https://github.com/duguodong7/pcb-merging}.
翻译:尽管微调预训练模型已成为常规实践,但这些模型在其特定领域之外往往表现不佳。近期发展的模型融合技术使得能够将多个针对不同任务微调的模型直接集成到单一模型中。该策略无需在原始数据集上重新训练即可提升多任务处理能力。然而,现有方法在解决任务间潜在冲突和复杂相关性方面存在不足,特别是在参数级调整上,这给有效平衡不同任务间的参数竞争带来了挑战。本文提出一种名为PCB-Merging(参数竞争平衡)的创新技术,这是一种轻量级且无需训练的方法,通过调整各参数的系数来实现有效的模型融合。PCB-Merging采用内部平衡来评估参数在单个任务内的重要性,并通过跨任务平衡来评估参数在不同任务间的相似性。重要性评分较低的参数将被剔除,剩余参数经重新缩放后形成最终融合模型。我们在多种融合场景中评估了该方法,包括跨任务、跨领域和跨训练配置,以及领域外泛化能力。实验结果表明,我们的方法在多种模态、领域、模型规模、任务数量、微调形式及大语言模型上均实现了显著的性能提升,优于现有模型融合方法。代码公开于:\url{https://github.com/duguodong7/pcb-merging}。