Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models. Building on these findings, we provide theoretical and empirical analyses of the neural tangent kernel (NTK) of these models and establish a compelling link between task arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to edit pre-trained models through the NTK linearization.
翻译:任务算术作为在权重空间中直接编辑预训练模型的高效且可扩展方法近期兴起:通过叠加不同任务的微调权重,可提升模型在这些任务上的性能,而反向操作则实现任务遗忘。然而,我们对其有效性和底层原理的理解仍十分有限。本文对视觉语言模型中的任务算术展开系统性研究,证明权重解缠是使其有效的关键因素。该属性在预训练阶段形成,表现为权重空间中不同方向控制着函数空间中与任务相关的分立局部区域。值得注意的是,我们证明通过线性化模型在切线空间进行微调能增强权重解缠特性,从而显著提升任务算术在多个基准测试及不同模型上的表现。基于这些发现,我们对模型的神经切线核(NTK)开展理论与实证分析,建立了任务算术与NTK特征函数空间局域化之间的紧密联系。总体而言,本研究揭示了任务算术基本机制的新见解,并提供了通过NTK线性化编辑预训练模型的更可靠高效方法。