Recently, model merging methods have demonstrated powerful strengths in combining abilities on various tasks from multiple Large Language Models (LLMs). While previous model merging methods mainly focus on merging homogeneous models with identical architecture, they meet challenges when dealing with Multimodal Large Language Models (MLLMs) with inherent heterogeneous property, including differences in model architecture and the asymmetry in the parameter space. In this work, we propose AdaMMS, a novel model merging method tailored for heterogeneous MLLMs. Our method tackles the challenges in three steps: mapping, merging and searching. Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture. Then we apply linear interpolation on model weights to actively adapt the asymmetry in the heterogeneous MLLMs. Finally in the hyper-parameter searching step, we propose an unsupervised hyper-parameter selection method for model merging. As the first model merging method capable of merging heterogeneous MLLMs without labeled data, extensive experiments on various model combinations demonstrated that AdaMMS outperforms previous model merging methods on various vision-language benchmarks.
翻译:近年来,模型融合方法在整合多个大语言模型(LLMs)的多样化任务能力方面展现出强大优势。然而,现有模型融合方法主要针对架构相同的同构模型,在处理具有内在异构特性的多模态大语言模型(MLLMs)时面临挑战,这些挑战包括模型架构差异和参数空间不对称性。本研究提出AdaMMS,一种专为异构MLLMs设计的新型模型融合方法。该方法通过映射、融合与搜索三个步骤应对挑战:首先设计模型间映射函数以实现不同架构MLLMs的模型融合;其次通过模型权重的线性插值主动适应异构MLLMs的不对称性;最后在超参数搜索阶段,提出面向模型融合的无监督超参数选择方法。作为首个无需标注数据即可融合异构MLLMs的模型融合方法,在多种模型组合上的大量实验表明,AdaMMS在各类视觉-语言基准测试中均优于现有模型融合方法。