Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging
翻译:迁移学习——即在下游任务上进一步微调预训练模型——能带来显著优势,包括提升下游任务性能、加快收敛速度以及提高样本效率。这些优势导致针对特定任务的微调模型大量涌现,然而这些模型通常只能执行单一任务,且无法相互受益。近年来,模型合并技术应运而生,旨在无需额外训练即可将多个任务特定模型合并为单一的多任务模型。然而,现有合并方法常忽略不同模型参数间的干扰,导致合并多个模型时性能大幅下降。本文证明,先前的合并技术因两大干扰源而无意中丢失了有价值的信息:(a) 冗余参数值造成的干扰,以及(b) 各模型间特定参数值的符号不一致。为解决此问题,我们提出方法 TrIm, Elect Sign & Merge(TIES-Merging),该方法在模型合并时引入三个新颖步骤:(1) 重置微调中变化幅度较小的参数,(2) 解决符号冲突,(3) 仅合并与最终一致符号对齐的参数。我们发现,TIES-Merging 在涵盖多种模态、领域、任务数量、模型规模、架构及微调设置的多样化场景中,均优于若干现有方法。我们进一步分析了不同类型干扰对模型参数的影响,并强调了解决符号干扰的重要性。我们的代码见 https://github.com/prateeky2806/ties-merging。