TIES-Merging: Resolving Interference When Merging Models

Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, and highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging

翻译：迁移学习——即在预训练模型上进一步微调以适配下游任务——能带来显著优势，包括提升下游任务性能、加快收敛速度以及提高样本效率。这些优势催生了大量针对特定任务的微调模型，但这些模型通常只能执行单一任务，无法相互借鉴。近期，模型合并技术应运而生，它能在无需额外训练的情况下将多个任务特定模型整合为单一多任务模型。然而，现有合并方法往往忽略不同模型参数间的干扰，导致合并多个模型时性能大幅下降。本文证明，先前合并技术因两种主要干扰源而不慎丢失了有价值信息：(a)冗余参数值造成的干扰；(b)不同模型中同一参数值的符号分歧。为此，我们提出方法TRIM、ELECT SIGN与MERGE（TIES-Merging），该方法在模型合并中引入三个创新步骤：(1)重置微调过程中变化幅度较小的参数；(2)解决符号冲突；(3)仅合并与最终一致符号对齐的参数。我们发现，TIES-Merging在涵盖多种模态、领域、任务数量、模型规模、架构及微调设置的不同场景中，均优于现有多种方法。我们进一步分析了不同类型干扰对模型参数的影响，并强调了解决符号干扰的重要性。我们的代码已开源：https://github.com/prateeky2806/ties-merging

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/