The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.
翻译:模型中心的兴起使得获取可复用的模型组件变得更加容易,模型合并因此成为整合模型能力的实用工具。然而,这种模块化也造成了**治理缺口**:下游用户可以将已发布的权重重新组合成未经授权的混合模型,从而绕过安全对齐或许可条款。由于现有防御方法大多是事后补救且依赖于特定架构,在实际应用中,它们无法为多样化的架构和发布格式提供一致的保护。为填补这一缺口,我们提出了 \textsc{Trap}$^{2}$,这是一种与架构无关的保护框架,它将保护机制编码到微调期间的更新中,无论模型是以适配器还是完整模型的形式发布。\textsc{Trap}$^{2}$ 不依赖于特定架构的方法,而是将权重重新缩放作为合并过程的简单代理。它使已发布的权重在独立使用时保持有效,但在合并过程中常出现的重新缩放操作下,其性能会下降,从而破坏未经授权的合并。