Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.
翻译:微调预训练语言模型,尤其是大型语言模型,需要大量计算资源,且在不同领域和数据集上可能产生差异化的性能表现。本文研究了一种将来自不同训练场景的多个模型整合为统一模型的方法。该统一模型在多种数据领域表现优异,并展现出对领域外数据良好的泛化能力。我们提出了一种受进化算法启发的知识融合方法Evolver,该方法无需额外训练或补充训练数据。具体而言,我们的方法将不同语言模型的权重聚合为种群,随后通过变异和交叉操作生成子代模型。这些子代模型将与父代模型进行评估,从而保留在开发数据集上表现更优的模型。值得注意的是,我们的模型演化策略能够与现有模型融合框架无缝集成,为模型增强提供了通用工具。在主流语言模型(即仅编码器、仅解码器、编码器-解码器架构)上的实验结果表明,Evolver以显著优势超越先前最先进的模型。代码已公开于{https://github.com/duguodong7/model-evolution}。