Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space, by adding the fine-tuned weights of different tasks. The performance has been further improved by a linear property which is illustrated by weight disentanglement. Yet, conventional linearization methods (e.g., NTK linearization) not only double the time and training cost but also have a disadvantage on single-task performance. We propose a simple yet effective and efficient method that only fine-tunes linear layers, which improves weight disentanglement and efficiency simultaneously. Specifically, our study reveals that only fine-tuning the linear layers in the attention modules makes the whole model occur in a linear regime, significantly improving weight disentanglement. To further understand how our method improves the disentanglement of task arithmetic, we present a comprehensive study of task arithmetic by differentiating the role of representation model and task-specific model. In particular, we find that the representation model plays an important role in improving weight disentanglement whereas the task-specific models such as the classification heads can degenerate the weight disentanglement performance. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to editing pre-trained models.
翻译:任务算术作为一种直接在权重空间中编辑预训练模型的方法,近期因其成本效益与可扩展性而受到关注,其核心思想是通过叠加不同任务的微调权重来实现模型编辑。权重解缠所揭示的线性特性进一步提升了该方法的性能。然而,传统的线性化方法(如NTK线性化)不仅使训练时间和成本加倍,还会损害单任务性能。我们提出了一种简单、有效且高效的方法:仅微调线性层,从而同时提升权重解缠效果与计算效率。具体而言,我们的研究表明,仅微调注意力模块中的线性层即可使整个模型进入线性状态,显著改善权重解缠特性。为深入理解该方法如何提升任务算术的解缠效果,我们通过区分表征模型与任务特定模型的作用,对任务算术进行了系统性研究。研究发现,表征模型对提升权重解缠具有关键作用,而分类头等任务特定模型反而可能削弱解缠性能。总体而言,本研究揭示了任务算术的内在机制,并为编辑预训练模型提供了更可靠、高效的途径。