The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use of multi-task training data, we leverage LLMs' local linearity and task vectors' orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.Extensive experiments demonstrate that MetaGPT leads to improvements in task arithmetic and achieves state-of-the-art performance on multiple tasks.
翻译:以GPT-4为代表的大语言模型(LLMs)的出现推动了多任务学习(MTL)的探索,使单一模型能够在多种任务中展现出色能力。任务算术已成为一种高效经济的多任务学习方法,其通过将各任务对应的任务向量叠加至预训练模型,实现多任务性能的同步提升。然而,现有方法难以在保持最优性能、计算效率与数据隐私的同时适用于大语言模型。本文提出面向GPT级模型融合的\textbf{模型专属任务算术}方法,将模型融合目标形式化为多任务学习框架,旨在最小化融合模型与各独立任务模型之间的平均损失差异。由于数据隐私限制多任务训练数据的使用,我们利用大语言模型的局部线性特性与任务向量的正交性,分离数据项与缩放系数项,进而推导出模型专属的任务算术方法。所提出的MetaGPT无需依赖具体数据,且避免了繁重的参数搜索过程,为大语言模型提供了一种经济高效、易于实施的融合方案。大量实验表明,MetaGPT能有效提升任务算术的性能,并在多项任务中达到最先进的性能水平。