Model merging offers a scalable alternative to multi-task learning but often yields suboptimal performance on classification tasks. We attribute this degradation to a geometric misalignment between the merged encoder and static task-specific classifier heads. Existing methods typically rely on auxiliary parameters to enforce strict representation alignment. We challenge this approach by revealing that the misalignment is predominantly an orthogonal transformation, rendering such strict alignment unnecessary. Leveraging this insight, we propose MOMA (Masked Orthogonal Matrix Alignment), which rectifies the misalignment by jointly optimizing a global multi-task vector mask and task-specific orthogonal transformations. Crucially, MOMA absorbs corresponding new parameters directly into the existing model weights, achieving performance comparable to state-of-the-art baselines with zero additional parameters and zero added inference cost.
翻译:模型融合为多任务学习提供了一种可扩展的替代方案,但在分类任务上往往表现欠佳。我们将此性能下降归因于融合编码器与静态任务特定分类器头之间的几何失配。现有方法通常依赖辅助参数来强制实现严格的表示对齐。我们通过揭示这种失配本质上是正交变换,从而质疑此类严格对齐的必要性。基于这一洞见,我们提出MOMA(掩码正交矩阵对齐)方法,通过联合优化全局多任务向量掩码与任务特定的正交变换来校正失配。关键在于,MOMA将对应新参数直接吸收至现有模型权重中,在零额外参数与零额外推理成本的条件下,实现了与最先进基线方法相当的性能。