When a new release of a foundation model is published, practitioners typically need to repeat fine-tuning, even if the same task was already tackled in the previous version. A promising alternative is to reuse the parameter changes (i.e., task vectors) that capture how a model adapts to a specific task. However, these vectors often fail to transfer across different pre-trained models because their parameter spaces are misaligned. In this work, we show that successful transfer depends strongly on the gradient-sign structure of the new model. Based on this insight, we propose GradFix, which approximates the ideal sign structure and leverages it to transfer knowledge using only a handful of labeled samples. Notably, this requires no additional fine-tuning: we only compute a few target-model gradients without parameter updates and mask the source task vector accordingly. This yields an update that is locally aligned with the target loss landscape, effectively rebasing the task vector onto the new pre-training. We provide a theoretical guarantee that our method ensures first-order descent. Empirically, we demonstrate significant performance gains on vision and language benchmarks, consistently outperforming naive task vector addition and few-shot fine-tuning. We further show that transporting task vectors improves multi-task and multi-source model merging. Code is available at https://github.com/fillo-rinaldi/GradFix.
翻译:当基础模型发布新版本时,即使相同任务已在先前版本中完成,实践者通常仍需重复微调。一种有前景的替代方案是复用记录模型如何适应特定任务的参数变化(即任务向量)。然而,这些向量常因参数空间未对齐而无法在不同预训练模型间有效迁移。本研究发现,迁移成功与否主要取决于新模型的梯度符号结构。基于此洞见,我们提出GradFix方法:通过近似理想符号结构,仅利用少量标注样本即可实现知识迁移。该方法无需额外微调——仅需计算目标模型的若干梯度(不更新参数),并据此对源任务向量进行掩码处理。由此产生的更新与目标损失函数局部对齐,实现了任务向量在新预训练空间的有效重映射。我们提供了该方法确保一阶下降的理论保证。实验结果表明,在视觉与语言基准测试中,该方法均取得显著性能提升,持续优于朴素任务向量加法与小样本微调。进一步研究表明,任务向量迁移可提升多任务与多源模型融合效果。代码发布于https://github.com/fillo-rinaldi/GradFix。