Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.
翻译:多任务学习(MTL)旨在使单一模型能够同时处理多个任务。最新进展"任务算术"揭示,针对不同任务分别微调的多个模型可直接合并为单一模型,从而在无需使用初始训练数据进行重训练的情况下执行MTL。然而,这种模型的直接相加通常会导致合并后模型整体性能显著下降,其原因是多个任务间存在潜在冲突与复杂关联。由此引出的挑战是:如何在不使用原始训练数据的情况下更有效地合并预训练模型?本文提出了一种名为自适应模型合并(AdaMerging)的创新技术。该方法旨在以任务级或层级方式自动学习模型合并系数,且无需依赖原始训练数据。具体而言,AdaMerging作为一种自动化的无监督任务算术方案,利用多任务设置中未标注测试样本的熵最小化作为替代目标函数,迭代优化多个模型的合并系数。我们在八个任务上的实验结果表明,所提出的AdaMerging方案具有显著有效性。与当前最先进的任务算术合并方案相比,AdaMerging性能提升了11%。值得注意的是,AdaMerging在面对未见过的下游任务时展现出更强的泛化能力,并且在测试阶段可能发生的数据分布偏移下表现出显著增强的鲁棒性。