Model merging has emerged as a promising technique for combining multiple fine-tuned models into a single multitask model without retraining. However, the factors that determine whether merging will succeed or fail remain poorly understood. In this work, we investigate why specific models are merged better than others. To do so, we propose a concrete, measurable definition of mergeability. We investigate several potential causes for high or low mergeability, highlighting the base model knowledge as a dominant factor: Models fine-tuned on instances that the base model knows better are more mergeable than models fine-tuned on instances that the base model struggles with. Based on our mergeability definition, we explore a simple weighted merging technique that better preserves weak knowledge in the base model.
翻译:模型合并已成为一种有前景的技术,能够在不重新训练的情况下,将多个微调后的模型组合成一个单一的多任务模型。然而,决定合并成功或失败的因素仍不甚明了。在本工作中,我们探究了为何特定模型比其他模型更易于合并。为此,我们提出了一个具体、可度量的可合并性定义。我们研究了导致高或低可合并性的若干潜在原因,并指出基础模型的知识是一个主导因素:在基础模型更熟悉的实例上微调的模型,比在基础模型难以处理的实例上微调的模型具有更高的可合并性。基于我们的可合并性定义,我们探索了一种简单的加权合并技术,该技术能更好地保留基础模型中的薄弱知识。