The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.
翻译:预训练语言模型通过持续微调以更好地支持下游应用。然而,该操作可能导致在目标领域之外的通用任务上出现显著的性能退化。为解决此问题,我们提出LM-Cocktail方法,使微调模型在通用视角下保持弹性。该方法采用模型合并的形式,通过加权平均将微调语言模型与预训练基础模型或其他领域的同类模型进行整合。尽管方法简洁,LM-Cocktail却出人意料地有效:合并后的模型能在全部通用任务范围内获得优异的实证性能,同时保持其在目标领域的优越能力。我们基于LLama和BGE模型在FLAN、MMLU、MTEB等主流基准上开展了全面实验,结果验证了所提方法的有效性。相关代码与检查点已开源在https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail。