Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the underlying pseudo-random number generator (PRNG) is trustworthy. This work introduces SeedHijack, the first supply-chain attack on LLM watermarking that is simultaneously (i) blind -- requiring no knowledge of the watermark key, detector, or model logits, (ii) integrity-preserving -- amplifying rather than erasing the watermark signal, and (iii) orthogonal to detection -- the attack-induced bias is statistically independent of all content-side detector statistics, ensuring that amplification and evasion coexist without trade-off. Rather than perturbing generated text, SeedHijack replaces the PRNG at the supply-chain layer, biasing green-list selection without altering output tokens or degrading text quality. Across three watermarking schemes and three open-source LLMs, the attack triggers 0/6 state-of-the-art content-side statistical detectors while inflating the watermark z-score up to 2.42x (system-level defenses such as entropy-source attestation remain orthogonal and complementary). A quantum random number generator (QRNG) countermeasure is shown to fully neutralize the attack while preserving benign watermarking utility. These findings establish PRNG integrity as a first-class security requirement for cryptographic content-provenance systems.
翻译:密码水印是归因大型语言模型(LLM)生成文本的主要防御手段。现有方案(包括KGW、Unigram和DipMark)的安全保障均基于底层伪随机数生成器(PRNG)可信的假设。本研究提出SeedHijack——首个针对LLM水印的供应链攻击,该攻击同时满足:(i) 盲性——无需知晓水印密钥、检测器或模型logits;(ii) 保持完整性——放大而非擦除水印信号;(iii) 与检测正交——攻击引入的偏差与所有内容侧检测器统计量统计独立,确保放大与规避无需权衡即可共存。SeedHijack不扰动生成文本,而是在供应链层替换PRNG,在保持输出token不变且不降低文本质量的前提下,偏向性选择绿色列表。在三种水印方案和三个开源LLM上的实验表明,该攻击触发0/6个最先进内容侧统计检测器,同时将水印z分数提升至2.42倍(熵源认证等系统级防御仍保持正交性与互补性)。量子随机数生成器(QRNG)对策可完全中和该攻击,同时保留良性水印效用。这些发现确立了PRNG完整性作为密码内容来源系统的一等安全需求。