Knowledge Fusion By Evolving Weights of Language Models

Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.

翻译：微调预训练语言模型，尤其是大型语言模型，需要大量计算资源，且在不同领域和数据集上可能产生差异化的性能表现。本文研究了一种将来自不同训练场景的多个模型整合为统一模型的方法。该统一模型在多种数据领域表现优异，并展现出对领域外数据良好的泛化能力。我们提出了一种受进化算法启发的知识融合方法Evolver，该方法无需额外训练或补充训练数据。具体而言，我们的方法将不同语言模型的权重聚合为种群，随后通过变异和交叉操作生成子代模型。这些子代模型将与父代模型进行评估，从而保留在开发数据集上表现更优的模型。值得注意的是，我们的模型演化策略能够与现有模型融合框架无缝集成，为模型增强提供了通用工具。在主流语言模型（即仅编码器、仅解码器、编码器-解码器架构）上的实验结果表明，Evolver以显著优势超越先前最先进的模型。代码已公开于{https://github.com/duguodong7/model-evolution}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/