The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a governance gap: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose Trap$^2$, an architecture-agnostic protection framework that encodes protection into updates during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, Trap$^2$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized recomposition.
翻译:模型中心的兴起使得复用模型组件更加便捷,模型合并成为一项实用的能力整合工具。然而,这种模块化也带来了治理缺口:下游用户可将发布权重重新组合成未经授权的混合模型,从而绕过安全对齐或许可条款。由于现有防御措施多为事后补救且依赖特定架构,在实际应用中无法为不同架构和发布格式提供一致的保护。为弥补这一缺口,我们提出Trap$^2$,一种与架构无关的保护框架,可在微调过程中将保护机制编码到更新参数中,无论其以适配器还是完整模型形式发布。与依赖特定架构的方法不同,Trap$^2$将权重缩放作为合并过程的简单代理。该方法在独立使用场景下保持发布权重有效性,但在合并过程中常见的缩放操作下会使性能退化,从而阻止未经授权的模型重组。