The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose a novel method which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging (namely LM-Cocktail), where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.
翻译:预训练语言模型需持续微调以更好地支撑下游应用,但该操作可能导致目标领域之外的通用任务性能显著退化。为克服该问题,我们提出一种新方法,使微调模型在通用视角下保持弹性。该方法以模型融合的形式实现(即LM-Cocktail),通过加权平均将微调语言模型与预训练基座模型或来自其他领域的同侪模型进行融合。尽管方法简洁,LM-Cocktail却出奇有效:融合后的模型能在保持目标领域卓越能力的同时,在通用任务全范围内取得强劲的实证表现。我们基于LLama与BGE模型在FLAN、MMLU、MTEB等主流基准上开展了全面实验,结果验证了所提方法的有效性。代码与模型检查点已开源至https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail。