Recently, model merging techniques have surfaced as a solution to combine multiple single-talent models into a single multi-talent model. However, previous endeavors in this field have either necessitated additional training or fine-tuning processes, or require that the models possess the same pre-trained initialization. In this work, we identify a common drawback in prior works w.r.t. the inconsistency of unit similarity in the weight space and the activation space. To address this inconsistency, we propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC). Specifically, instead of solely maximizing the objective of a single space, we advocate for the exploration of permutation matrices situated in a region with a unified high similarity in the dual space, achieved through the linear combination of activation and weight similarity matrices. In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization. Comprehensive experimental comparisons demonstrate that MuDSC can significantly boost the performance of merged models with various task combinations and architectures. Furthermore, the visualization of the merged model within the multi-task loss landscape reveals that MuDSC enables the merged model to reside in the overlapping segment, featuring a unified lower loss for each task. Our code is publicly available at https://github.com/zju-vipa/training_free_model_merging.
翻译:近期,模型合并技术作为一种将多个单一能力模型整合为统一多能力模型的方法应运而生。然而,该领域的先前工作要么需要额外的训练或微调过程,要么要求模型具备相同的预训练初始化。在本工作中,我们发现先前研究普遍存在权重空间与激活空间中单元相似度不一致的缺陷。为消除这种不一致性,我们创新性地提出名为"双空间约束下合并"(MuDSC)的模型合并框架。具体而言,我们倡导通过激活相似度矩阵与权重相似度矩阵的线性组合,探索位于双空间高相似度统一区域的置换矩阵,而非仅优化单一空间的目标函数。为提升实用性,我们还针对多头注意力机制和组归一化等分组结构进行了适配改进。全面的实验对比表明,MuDSC能显著提升合并模型在不同任务组合与架构下的性能。此外,对多任务损失景观中合并模型的可视化分析显示,MuDSC可使合并模型驻留于重叠区域,实现各任务损失值的统一降低。我们的代码已开源至 https://github.com/zju-vipa/training_free_model_merging。