Resolving Interference When Merging Models

Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging

翻译：迁移学习——即在下游任务上进一步微调预训练模型——能带来显著优势，包括提升下游任务性能、加快收敛速度以及提高样本效率。这些优势导致针对特定任务的微调模型大量涌现，然而这些模型通常只能执行单一任务，且无法相互受益。近年来，模型合并技术应运而生，旨在无需额外训练即可将多个任务特定模型合并为单一的多任务模型。然而，现有合并方法常忽略不同模型参数间的干扰，导致合并多个模型时性能大幅下降。本文证明，先前的合并技术因两大干扰源而无意中丢失了有价值的信息：(a) 冗余参数值造成的干扰，以及(b) 各模型间特定参数值的符号不一致。为解决此问题，我们提出方法 TrIm, Elect Sign & Merge（TIES-Merging），该方法在模型合并时引入三个新颖步骤：(1) 重置微调中变化幅度较小的参数，(2) 解决符号冲突，(3) 仅合并与最终一致符号对齐的参数。我们发现，TIES-Merging 在涵盖多种模态、领域、任务数量、模型规模、架构及微调设置的多样化场景中，均优于若干现有方法。我们进一步分析了不同类型干扰对模型参数的影响，并强调了解决符号干扰的重要性。我们的代码见 https://github.com/prateeky2806/ties-merging。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】机器学习速查手册，135页pdf

专知会员服务

129+阅读 · 2020年11月20日

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日