The open-weight language model ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from diverse sources. A critical prerequisite for applying these methods across different model families is tokenizer transplant, which aligns incompatible vocabularies to a shared embedding space. We demonstrate that this essential interoperability step introduces a supply-chain vulnerability: we engineer a single breaker token that is functionally inert in a donor model yet reliably reconstructs into a high-salience malicious feature after transplant into a base model. By exploiting the geometry of coefficient reuse, our attack sabotages the base model's generation while leaving the donor's utility statistically indistinguishable from nominal behavior. We formalize this as a dual-objective optimization problem and instantiate the attack using a sparse solver. Empirically, the attack is training-free and evades outlier detection, while demonstrating structural persistence against fine-tuning and weight merging, highlighting a hidden risk in the pipeline of modular AI composition. Code is available at https://github.com/xz-liu/tokenforge
翻译:开放权重大语言模型生态系统日益由模型组合技术(如权重合并、推测解码和词汇扩展)所定义,这些技术融合了来自不同来源的能力。在不同模型族之间应用这些方法的一个关键前提是分词器移植,它将不兼容的词汇表对齐到一个共享的嵌入空间。我们证明,这一关键的互操作性步骤引入了一个供应链漏洞:我们设计了一个单一的破坏性令牌,该令牌在供体模型中功能上是惰性的,但在移植到基础模型后,却能可靠地重构为高显著性的恶意特征。通过利用系数重用的几何特性,我们的攻击破坏了基础模型的生成,同时使供体模型的效用与标称行为在统计上无法区分。我们将此形式化为一个双目标优化问题,并使用稀疏求解器实例化了该攻击。实证表明,该攻击无需训练且能规避异常值检测,同时展现出对微调和权重合并的结构持久性,凸显了模块化人工智能组合流程中的隐藏风险。代码可在 https://github.com/xz-liu/tokenforge 获取。