Adapting large pre-trained models to downstream tasks often produces task-specific parameter updates that are expensive to relearn for every model variant. While recent work has shown that such updates can be transferred between models with identical architectures, transferring them across models of different widths remains largely unexplored. In this work, we introduce Theseus, a training-free method for transporting task-specific updates across heterogeneous models. Rather than matching parameters directly, we characterize a task update by the functional effect it induces on intermediate representations. We formalize task-vector transport as a functional matching problem on observed activations and show that, after aligning representation spaces via orthogonal Procrustes analysis, it admits a stable closed-form solution that preserves the geometry of the update. We evaluate Theseus on vision and language models across different widths, showing consistent improvements over strong baselines without additional training or backpropagation. Our results show that task updates can be meaningfully transferred across architectures when task identity is defined functionally rather than parametrically.
翻译:将大型预训练模型适配至下游任务时,常产生针对特定任务的参数更新,这些更新对每个模型变体重新学习而言成本高昂。尽管近期研究表明此类更新可在相同架构的模型间迁移,但在不同宽度的模型间进行迁移仍鲜有探索。本研究提出Theseus方法,一种无需训练的跨异构模型迁移任务特定更新的技术。该方法不直接匹配参数,而是通过更新对中间表征产生的功能效应来刻画任务更新。我们将任务向量迁移形式化为观测激活值上的函数匹配问题,并证明在通过正交Procrustes分析对齐表征空间后,该问题存在保持更新几何结构的稳定闭式解。我们在不同宽度的视觉与语言模型上评估Theseus方法,结果表明其在无需额外训练或反向传播的情况下,相对强基线模型持续获得性能提升。研究证明:当任务身份通过功能而非参数方式定义时,任务更新能够在不同架构间实现有效迁移。