As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's capabilities, severely limiting the usefulness of the resulting model. We address this issue by proposing Branch-and-Merge (BaM), a new adaptation method based on iteratively merging multiple models, fine-tuned on a subset of the available training data. BaM is based on the insight that this yields lower magnitude but higher quality weight changes, reducing forgetting of the source domain while maintaining learning on the target domain. We demonstrate in an extensive empirical study on Bulgarian and German that BaM can significantly reduce forgetting while matching or even improving target domain performance compared to both standard continued pretraining and instruction finetuning across different model architectures.
翻译:随着开源权重的大型语言模型(LLM)在英语的广泛任务中取得日益优异的性能,实践者致力于将这些模型适配至不同语言。然而,此类语言适配常伴随基础模型能力的灾难性遗忘,严重限制了所得模型的实用性。我们通过提出分支与合并(BaM)方法来解决这一问题,这是一种基于迭代合并多个模型的适配方法,这些模型均在可用训练数据的子集上进行微调。BaM基于以下洞见:该方法能产生幅度更小但质量更高的权重变化,从而在保持目标领域学习的同时减少对源领域的遗忘。我们在保加利亚语和德语上进行的广泛实证研究表明,与标准的持续预训练和指令微调相比,BaM能在不同模型架构上匹配甚至提升目标领域性能的同时,显著减少遗忘现象。