Large language models generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Current watermarking algorithms, however, face the challenge of achieving both the detectability of inserted watermarks and the semantic integrity of generated texts, where enhancing one aspect often undermines the other. To overcome this, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at https://github.com/mignonjia/TS_watermark .
翻译:大型语言模型生成的高质量文本可能蕴含虚假信息,因此亟需通过区分AI生成与人类撰写文本来实现监管。水印技术在此语境下至关重要——它在语言模型推理阶段向文本中嵌入人类难以察觉的隐藏标记。然而,现有水印算法难以同时兼顾嵌入水印的检测性与生成文本的语义完整性,强化其中一项往往会削弱另一项。为解决这一矛盾,我们提出了一种基于多目标优化(MOO)的新型水印方法,通过轻量级网络生成令牌级水印对数概率与划分比例。借助MOO同时优化检测与语义目标函数,该方法可同步实现检测性与语义完整性。实验结果表明,相较于现有水印技术,本方法在提升大模型生成文本检测性的同时,能保持其语义连贯性。相关代码已开源至 https://github.com/mignonjia/TS_watermark。