With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).
翻译:随着大语言模型(LLM)在日常生活中的广泛应用,其潜在滥用风险与社会影响日益引发关注。水印技术通过向模型生成的文本中注入特定模式,旨在追溯特定模型的使用来源。理想的水印应使输出文本与原始LLM的生成结果几乎无法区分(不可感知性),同时确保高检测率(有效性),即使在文本遭受部分篡改时仍能保持检测能力(鲁棒性)。尽管已有多种方法被提出,但尚无任何方法能同时实现这三种特性,这表明三者之间存在固有的权衡关系。本文采用以密钥为中心的框架,通过将水印分解为两个独立模块——密钥模块与标记模块,统一了现有水印技术。基于此分解,我们首次证明密钥模块是导致现有方法出现权衡问题的关键因素。具体而言,这体现在生成阶段密钥采样空间的规模与检测阶段密钥恢复复杂度之间的冲突。为此,我们提出 \textbf{WaterPool},一种简洁而高效的密钥模块,它在保持不可感知性所需完整密钥采样空间的同时,利用基于语义的搜索改进密钥恢复过程。WaterPool 可作为插件与大多数水印方法集成。我们在三种经典水印技术(KGW、EXP、ITS)上的实验表明,WaterPool 能显著提升其性能,实现接近最优的不可感知性,并明显改善有效性与鲁棒性(KGW 提升 12.73%,EXP 提升 20.27%,ITS 提升 7.27%)。