The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.
翻译:预训练语言模型通过持续微调以更好地支持下游应用。然而,这一操作可能导致模型在目标领域之外的通用任务上出现显著的性能退化。为克服该问题,我们提出LM-Cocktail方法,使微调后的模型在通用视角下保持韧性。该方法以模型合并的形式实现,即通过加权平均将微调语言模型与预训练基座模型或其他领域的同辈模型进行融合。尽管方法简洁,LM-Cocktail却出人意料地有效:所得模型能在涵盖通用任务的全域范围内实现强劲的经验性能,同时在其目标领域保持卓越能力。我们基于LLama与BGE模型在FLAN、MMLU、MTEB等主流基准上开展全面实验,实验结果验证了所提方法的有效性。代码与检查点已开源至 https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail。