The advancement of Large Language Models (LLMs) has led to increasing concerns about the misuse of AI-generated text, and watermarking for LLM-generated text has emerged as a potential solution. However, it is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model. This paper proposes an adaptive watermarking strategy to address this problem. To improve the text quality and maintain robustness, we adaptively add watermarking to token distributions with high entropy measured using an auxiliary model and keep the low entropy token distributions untouched. For the sake of security and to further minimize the watermark's impact on text quality, instead of using a fixed green/red list generated from a random secret key, which can be vulnerable to decryption and forgery, we adaptively scale up the output logits in proportion based on the semantic embedding of previously generated text using a well designed semantic mapping model. Our experiments involving various LLMs demonstrate that our approach achieves comparable robustness performance to existing watermark methods. Additionally, the text generated by our method has perplexity comparable to that of \emph{un-watermarked} LLMs while maintaining security even under various attacks.
翻译:大规模语言模型(LLMs)的进步引发了人们对AI生成文本滥用的日益担忧,而针对LLM生成文本的水印技术已成为一种潜在解决方案。然而,在保持强安全性、鲁棒性且无需预知提示或模型的情况下生成高质量水印文本仍面临挑战。本文提出一种自适应水印策略以解决该问题。为提升文本质量并保持鲁棒性,我们利用辅助模型测量高熵的令牌分布,并对其自适应添加水印,同时保持低熵令牌分布不变。出于安全考虑并进一步降低水印对文本质量的影响,我们未采用基于随机密钥生成的固定绿/红列表(该方式易受解密与伪造攻击),而是通过精心设计的语义映射模型,根据先前生成文本的语义嵌入自适应地按比例放大输出logits。针对多种LLM的实验表明,我们的方法在鲁棒性上可与现有水印方法相媲美。此外,该方法生成的文本困惑度与未加水印的LLM相当,且在多种攻击下仍能保持安全性。