The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose a novel method which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging (namely LM-Cocktail), where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding.
翻译:预训练语言模型通过持续微调以更好地支持下游应用。然而,该操作可能导致模型在目标领域之外的通用任务上出现显著的性能退化。为解决这一问题,我们提出了一种新颖方法,使微调后的模型能够在通用视角下保持弹性。该方法以模型合并的形式实施(即LM-Cocktail),通过加权平均将微调语言模型与预训练基础模型或其他领域的同类模型进行合并。尽管方法简洁,LM-Cocktail却展现出惊人的有效性:合并后的模型能在保持目标领域卓越性能的同时,在通用任务全范围内取得强劲的经验表现。我们基于LLama和BGE模型在FLAN、MMLU、MTEB等主流基准上开展了全面实验,结果验证了所提方法的有效性。代码与模型检查点已开源至 https://github.com/FlagOpen/FlagEmbedding。