Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic weights, i.e., parameters that are important exclusively to one task and irrelevant to all tasks but detrimental to multi-task fusion. For this reason, we propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches. Our experiments in vision and NLP benchmarks with up to 20 tasks, show that Consensus Merging consistently improves existing approaches. Furthermore, our proposed compression scheme reduces storage from 57Gb to 8.2Gb while retaining 99.7% of original performance.
翻译:模型合并与任务算术已成为将多个单任务检查点合并为多任务模型的有前景的可扩展方法,但其适用性因显著的性能损失而降低。先前的研究将这些损失归因于权重空间中的干扰和对重要任务特定特征的擦除。然而,在这项工作中,我们证明每个任务所需的信息在合并后仍然保留,因为不同任务主要使用不相交的权重集合。我们提出TALL-masks方法,该方法能够基于一组任务向量识别这些任务支持,并表明通过将掩码应用于多任务向量,可以恢复>99%的单任务准确率,从而有效压缩单个检查点。我们研究构造掩码间交集的统计特性,揭示自私权重和灾难权重的存在——即仅对一个任务重要、与所有任务无关但对多任务融合有害的参数。为此,我们提出共识合并算法,该算法消除此类权重并提升现有模型合并方法的整体性能。我们在视觉和NLP基准测试中进行了多达20个任务的实验,结果表明共识合并方法持续改进现有方法。此外,我们提出的压缩方案将存储空间从57Gb减少至8.2Gb,同时保留了99.7%的原始性能。