Model merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across various tasks while maintaining computational efficiency. This paper introduces Activation-Informed Merging (AIM), a technique that integrates the information from the activation space of LLMs into the merging process to improve performance and robustness. AIM is designed as a flexible, complementary solution that is applicable to any existing merging method. It aims to preserve critical weights from the base model, drawing on principles from continual learning (CL) and model compression. Utilizing a task-agnostic calibration set, AIM selectively prioritizes essential weights during merging. We empirically demonstrate that AIM significantly enhances the performance of merged models across multiple benchmarks. Our findings suggest that considering the activation-space information can provide substantial advancements in the model merging strategies for LLMs, with up to a 40% increase in benchmark performance.
翻译:模型融合是一种通过整合多个经过微调的大语言模型(LLMs)的参数与嵌入表示来提升模型性能的方法,它在保持计算效率的同时,为增强模型在各类任务上的表现提供了有前景的途径。本文提出了一种基于激活信息的模型融合技术(Activation-Informed Merging,AIM),该技术将大语言模型激活空间中的信息整合到融合过程中,以提高模型的性能与鲁棒性。AIM被设计为一种灵活、互补的解决方案,可适用于任何现有的融合方法。它借鉴了持续学习(CL)与模型压缩的原理,旨在保留基础模型中的关键权重。通过使用一个与任务无关的校准数据集,AIM在融合过程中有选择地优先处理核心权重。我们通过实验证明,AIM能显著提升融合模型在多个基准测试中的性能。研究结果表明,考虑激活空间信息可为大语言模型的融合策略带来实质性改进,基准性能提升最高可达40%。