Model merging (MM) has gained significant attention as a cost-effective approach to integrate multiple task-specific models into a unified model. However, recent work reveals that MM is highly susceptible to backdoor attacks. Existing defenses based on task arithmetic often fail to eliminate backdoors without substantially degrading clean-task performance, owing to their reliance on direct parameter-space editing. To address this gap, we propose Linear Feature Path Minimization (LFPM), a backdoor mitigation framework for model merging, which introduces an anti-backdoor task vector into the backdoored merged model. Unlike prior approaches, LFPM formulates the backdoor robustness of the merged model from a unified feature-space perspective under the Cross-Task Linearity (CTL) framework, which leverages the approximate linearity of features across tasks. This perspective guides the optimization of the anti-backdoor task to suppress backdoors while preserving clean-task performance. Furthermore, we introduce an effective optimization mechanism based on gradient accumulation and loss path-integral, ensuring robust backdoor suppression along the interpolation path. Extensive experiments demonstrate that LFPM consistently exhibits strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.
翻译:模型合并(MM)作为一种将多个任务特定模型整合为统一模型的低成本方法已受到广泛关注。然而,近期研究表明MM极易受到后门攻击。现有基于任务算术的防御方法由于依赖直接的参数空间编辑,往往在消除后门的同时会显著降低干净任务性能。为解决这一问题,我们提出线性特征路径最小化(LFPM)——一种面向模型合并的后门缓解框架,该方法向被植入后门的合并模型中引入反后门任务向量。与现有方法不同,LFPM在跨任务线性(CTL)框架下从统一的特征空间视角构建合并模型的后门鲁棒性,该框架利用了任务间特征的近似线性特性。这一视角指导反后门任务的优化,使其在抑制后门的同时保持干净任务性能。此外,我们提出一种基于梯度累积与损失路径积分的有效优化机制,确保沿插值路径实现鲁棒的后门抑制。大量实验表明,LFPM在全微调及参数高效微调(PEFT)设置下均展现出对后门攻击的持续强鲁棒性。