Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion
翻译:从共同的大规模预训练模型微调而来、但专门用于不同任务的模型融合,已被证明是一种低成本且可扩展的策略,用于构建在多种任务上表现良好的多任务模型。最近的研究,如任务算术方法,强调这种多任务模型可以通过对任务向量进行算术运算来获得。然而,当前的融合技术通常通过评估单个属性(如参数的幅度或符号)来解决来自任务特定模型参数间的潜在冲突,而忽略了它们对模型整体功能的集体影响。在这项工作中,我们提出了离散的连续松弛(Concrete)子空间学习方法,以识别一个共同的低维子空间,并利用其共享信息来追踪干扰问题,同时不牺牲过多性能。具体来说,我们将该问题建模为双层优化问题,并引入元学习框架,通过基于梯度的方法找到具体的子空间掩码。在上层,我们专注于学习一个共享的具体掩码以识别子空间;而在内层,执行模型融合以最大化融合模型的性能。我们在视觉领域和语言领域进行了大量实验,结果证明了我们方法的有效性。代码可在 https://github.com/tanganke/subspace_fusion 获取。