Unconstrained Model Merging for Enhanced LLM Reasoning

Yiming Zhang,Baoyi He,Shengyu Zhang,Yuhao Fu,Qi Zhou,Zhijie Sang,Zijin Hong,Kejing Yang,Wenjun Wang,Jianbo Yuan,Guangning Han,Linyi Li,Chunlin Ji,Fei Wu,Hongxia Yang

from arxiv, Under review

Recent advancements in building domain-specific large language models (LLMs) have shown remarkable success, especially in tasks requiring reasoning abilities like logical inference over complex relationships and multi-step problem solving. However, creating a powerful all-in-one LLM remains challenging due to the need for proprietary data and vast computational resources. As a resource-friendly alternative, we explore the potential of merging multiple expert models into a single LLM. Existing studies on model merging mainly focus on generalist LLMs instead of domain experts, or the LLMs under the same architecture and size. In this work, we propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures with a focus on reasoning tasks. A fine-grained layer-wise weight merging strategy is designed for homogeneous models merging, while heterogeneous model merging is built upon the probabilistic distribution knowledge derived from instruction-response fine-tuning data. Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that combinatorial reasoning emerges from merging which surpasses simple additive effects. We propose that unconstrained model merging could serve as a foundation for decentralized LLMs, marking a notable progression from the existing centralized LLM framework. This evolution could enhance wider participation and stimulate additional advancement in the field of artificial intelligence, effectively addressing the constraints posed by centralized models.

翻译：近期在构建领域专用大型语言模型（LLMs）方面取得的进展显示出显著成效，特别是在需要推理能力的任务中，如复杂关系的逻辑推断和多步骤问题求解。然而，由于需要专有数据和大量计算资源，创建一个强大的全能型LLM仍然具有挑战性。作为一种资源友好的替代方案，我们探索了将多个专家模型融合为单一LLM的潜力。现有关于模型融合的研究主要集中于通用型LLM而非领域专家模型，或局限于相同架构和规模的LLM。在本工作中，我们提出了一个无约束模型融合框架，该框架兼容同构与异构模型架构，并专注于推理任务。针对同构模型融合，我们设计了细粒度的分层权重融合策略；而对于异构模型融合，则建立在从指令-响应微调数据中提取的概率分布知识基础上。通过在7个基准测试和9个经过推理优化的LLM上进行实验，我们揭示了关键发现：融合产生的组合推理能力超越了简单的叠加效应。我们认为无约束模型融合可作为去中心化LLMs的基础，标志着从现有中心化LLM框架的重要演进。这一发展有望促进更广泛的参与，并推动人工智能领域的进一步进步，从而有效应对中心化模型带来的限制。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日