Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models. Building on these findings, we provide theoretical and empirical analyses of the neural tangent kernel (NTK) of these models and establish a compelling link between task arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to edit pre-trained models through the NTK linearization.
翻译:任务算术近年来作为一种经济且可扩展的方法出现,可直接在权重空间编辑预训练模型:通过添加不同任务的微调权重,可提升模型在这些任务上的性能,而对其进行否定则导致任务遗忘。然而,我们对任务算术有效性及其基本原理的理解仍然有限。我们针对视觉语言模型中的任务算术进行了全面研究,并揭示权重解缠是其生效的关键因素。该属性在预训练期间产生,表现为权重空间中的不同方向控制着与任务相关的函数空间中的独立局部区域。值得注意的是,我们表明通过线性化在切空间中对模型进行微调能够放大权重解缠。这带来了在多个任务算术基准和不同模型上的显著性能提升。基于这些发现,我们对这些模型的神经正切核(NTK)进行了理论和实证分析,并建立了任务算术与NTK特征函数空间局部化之间的紧密联系。总体而言,我们的工作揭示了任务算术基本机制的新见解,并提供了通过NTK线性化进行预训练模型编辑的更可靠且高效的方法。