Model merging has become a practical post-training strategy for building a single multi-task large language model (LLM) by combining multiple task-specialized models. However, most existing approaches rely on post-hoc merging, in which task-specific models are merged only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure across individual tasks. In this work, we show that replacing post-hoc merging with an iterative many-shot merging protocol is effective in improving multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure. (Project page: https://imkyungjin.github.io/METIS/)
翻译:模型合并已成为一种实用的训练后策略,通过组合多个任务专用模型来构建单一的多任务大语言模型。然而,现有方法大多依赖事后合并,即任务专用模型仅在训练完成后进行一次合并。这种单次聚合常遭受任务干扰,导致各任务间的信息被擦除。本工作中,我们证明将事后合并替换为迭代的多轮合并协议能有效提升多任务性能。基于此发现,我们提出METIS——一种通过任务干扰缓解擦除以实现稳定多轮合并的方法。METIS是一种损失感知的多轮合并方法,通过任务级损失差异加权和基于共识的掩码技术解决了事后合并中的信息擦除问题。值得注意的是,METIS在表现最差的任务上展现出显著性能提升,有效缓解了信息擦除现象。(项目页面:https://imkyungjin.github.io/METIS/)