Deep Model Fusion: A Survey

Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.

翻译：深度模型融合/合并是一种新兴技术，它将多个深度学习模型的参数或预测结果合并为单一模型。该技术通过整合不同模型的能力，弥补单个模型的偏差与误差，从而实现更优性能。然而，针对大规模深度学习模型（如大语言模型和基础模型）的深度模型融合面临诸多挑战，包括高计算成本、高维参数空间、异构模型间的相互干扰等。虽然模型融合因其解决复杂现实任务的潜力而受到广泛关注，但该领域仍缺乏完整详尽的综述研究。为此，为更深入理解模型融合方法并推动其发展，本文提出一项全面综述，系统总结最新研究进展。具体而言，我们将现有深度模型融合方法分为四类：（1）“模式连通”——通过非递增损失路径连接权重空间中的解，为模型融合获得更优初始化；（2）“对齐”——匹配神经网络间单元，为融合创造更优条件；（3）“权重平均”——经典模型融合方法，通过平均多模型权重获得更接近最优解的精确结果；（4）“集成学习”——融合不同模型输出，是提升最终模型精度与鲁棒性的基础技术。此外，我们分析了深度模型融合面临的挑战，并提出未来模型融合的可能研究方向。本综述有助于深入理解不同模型融合方法间的关联及实际应用方法，可为深度模型融合领域的研究提供启示。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日