Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.
翻译:深度模型融合/合并是一种新兴技术,它将多个深度学习模型的参数或预测结果合并为单一模型。该技术通过整合不同模型的能力,弥补单个模型的偏差与误差,从而实现更优性能。然而,针对大规模深度学习模型(如大语言模型和基础模型)的深度模型融合面临诸多挑战,包括高计算成本、高维参数空间、异构模型间的相互干扰等。虽然模型融合因其解决复杂现实任务的潜力而受到广泛关注,但该领域仍缺乏完整详尽的综述研究。为此,为更深入理解模型融合方法并推动其发展,本文提出一项全面综述,系统总结最新研究进展。具体而言,我们将现有深度模型融合方法分为四类:(1)“模式连通”——通过非递增损失路径连接权重空间中的解,为模型融合获得更优初始化;(2)“对齐”——匹配神经网络间单元,为融合创造更优条件;(3)“权重平均”——经典模型融合方法,通过平均多模型权重获得更接近最优解的精确结果;(4)“集成学习”——融合不同模型输出,是提升最终模型精度与鲁棒性的基础技术。此外,我们分析了深度模型融合面临的挑战,并提出未来模型融合的可能研究方向。本综述有助于深入理解不同模型融合方法间的关联及实际应用方法,可为深度模型融合领域的研究提供启示。