Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Model merging has emerged as a transformative paradigm for combining the capabilities of multiple neural networks into a single unified model without additional training. With the rapid proliferation of fine-tuned large language models~(LLMs), merging techniques offer a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey presents a comprehensive and structured examination of model merging in the LLM era through the \textbf{FUSE} taxonomy, a four-dimensional framework organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry, mode connectivity, and the linear mode connectivity hypothesis. We then systematically review the algorithmic landscape, spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization approaches. For each method family, we analyze the core formulation, highlight representative works, and discuss practical trade-offs. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, multilingual transfer, and federated learning. Finally, we survey the supporting ecosystem of open-source tools, community platforms, and evaluation benchmarks, and identify key open challenges including theoretical gaps, scalability barriers, and standardization needs. This survey aims to equip researchers and practitioners with a structured foundation for advancing model merging.

翻译：模型融合已成为一种变革性范式，它能够将多个神经网络的能力整合到一个统一的模型中，而无需额外训练。随着微调后的大语言模型（LLMs）的快速普及，融合技术为集成学习和完全重新训练提供了一种计算高效的替代方案，使实践者能够以最低成本组合专业化能力。本综述通过**FUSE**分类法——一个沿**F**oundations（基础）、**U**nification Strategies（统一策略）、**S**cenarios（场景）和**E**cosystem（生态系统）四个维度组织的框架——对大语言模型时代的模型融合进行了全面而结构化的审视。我们首先建立了融合的理论基础，包括损失景观几何、模式连通性和线性模式连通性假设。然后，我们系统地回顾了算法全景，涵盖权重平均、任务向量算术、稀疏化增强方法、专家混合架构和进化优化方法。针对每个方法族，我们分析了其核心公式，重点介绍了代表性工作，并讨论了实际的权衡取舍。我们进一步审视了跨多任务学习、安全对齐、领域专业化、多语言迁移和联邦学习的下游应用。最后，我们调查了支持性的生态系统，包括开源工具、社区平台和评估基准，并指出了关键开放挑战，包括理论空白、可扩展性障碍和标准化需求。本综述旨在为研究人员和实践者提供一个结构化的基础，以推动模型融合的发展。