Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey examines model merging in the LLM era through the \textbf{FUSE} taxonomy, organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry and mode connectivity, then systematically review the algorithmic space spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, and federated learning, and survey the supporting ecosystem of tools and evaluation benchmarks. Finally, we identify key open challenges and future directions, aiming to equip researchers and practitioners with a structured foundation for advancing model merging.
翻译:模型融合通过组合多个神经网络的参数,在无需额外训练的条件下生成单一模型。随着经过微调的大语言模型(LLM)日益增多,融合技术为集成学习和完全重新训练提供了一种计算高效的替代方案,使实践者能够以最低成本组合专业化能力。本综述通过 **FUSE** 分类法考察LLM时代的模型融合,该分类法围绕**F**基础理论、**U**统一策略、**S**应用场景和**E**生态系统展开。我们首先建立融合的理论基础,包括损失景观几何学和模式连通性,然后系统性地回顾算法空间,涵盖权重平均、任务向量算术、稀疏化增强方法、混合专家架构和进化优化。我们进一步研究跨多任务学习、安全对齐、领域专业化和联邦学习的下游应用,并调查支持性生态系统中的工具和评估基准。最后,我们指出关键开放挑战和未来方向,旨在为研究人员和实践者推进模型融合提供结构化基础。