Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.
翻译:模型融合是一种将多个大型预训练模型合并为单一模型的技术,旨在提升性能并扩展任务适应范围。由于该方法无需原始训练数据且避免了额外训练过程,在大型预训练模型开发中日益受到关注。然而,现有模型融合方法大多仅关注参数空间的探索,仅能合并架构完全相同的模型。尽管架构空间融合具有巨大潜力,但由于搜索空间庞大及层间兼容性等挑战,相关研究仍处于早期阶段。本文通过将架构空间融合过程建模为强化学习任务,实现了向更灵活、更全面的模型融合技术的重要迈进。我们采用权重向量的离线采样训练策略网络与价值网络,进而将其用于融合策略的在线优化。此外,本文引入多目标优化范式以适应用户多样化的任务偏好,通过学习最优模型的帕累托前沿来提供定制化融合方案。在文本翻译、数学推理和代码生成等多个任务上的实验结果验证了所提框架在模型融合中的有效性与优越性。相关代码将在评审结束后公开。