Unlocking the Potential of Model Merging for Low-Resource Languages

Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. We use model merging to develop task-solving LLMs for low-resource languages without SFT data in the target languages. Our experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data. Observing performance saturation in model merging with more training tokens, we further analyze the merging process and introduce a slack variable to the model merging algorithm to mitigate the loss of important parameters, thereby enhancing performance. We hope that model merging can benefit more human languages suffering from data scarcity with its higher data efficiency.

翻译：使大型语言模型（LLM）适应新语言通常涉及持续预训练（CT）和随后的监督微调（SFT）。然而，这种“先CT后SFT”的方法在低资源语言数据有限的背景下效果不佳，难以平衡语言建模与任务解决能力。因此，我们提出将模型融合作为低资源语言的替代方案，该方法将具有不同能力的模型组合成单一模型，且无需额外训练。我们利用模型融合为低资源语言开发具备任务解决能力的大型语言模型，而无需目标语言的SFT数据。基于Llama-2-7B的实验表明，模型融合能有效赋予低资源语言大型语言模型以任务解决能力，在数据极度匮乏的场景下优于“先CT后SFT”方法。观察到模型融合性能随训练标记数量增加而饱和后，我们进一步分析了融合过程，并在模型融合算法中引入松弛变量以减轻重要参数的损失，从而提升性能。我们希望模型融合凭借其更高的数据效率，能够惠及更多受数据稀缺困扰的人类语言。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/