Extensive fine-tuning on Large Language Models does not always yield better results. Oftentimes, models tend to get better at imitating one form of data without gaining greater reasoning ability and may even end up losing some intelligence. Here I introduce EvoMerge, a systematic approach to large language model training and merging. Leveraging model merging for weight crossover and fine-tuning for weight mutation, EvoMerge establishes an evolutionary process aimed at pushing models beyond the limits of conventional fine-tuning.
翻译:对大型语言模型进行大量微调并不总能带来更好的结果。模型往往更擅长模仿某一种数据形式,但并未获得更强的推理能力,甚至可能失去部分智能。本文提出EvoMerge,一种系统化的大型语言模型训练与合并方法。通过利用模型合并进行权重交叉,以及微调进行权重变异,EvoMerge建立了一个旨在推动模型超越传统微调极限的进化过程。