Large language models generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Achieving both the detectability of inserted watermarks and the semantic quality of generated texts is challenging. While current watermarking algorithms have made promising progress in this direction, there remains significant scope for improvement. To address these challenges, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at https://github.com/mignonjia/TS_watermark.
翻译:大语言模型生成的高质量响应中可能包含错误信息,这凸显了通过区分AI生成文本与人类撰写文本来进行监管的必要性。水印技术在此背景下至关重要,其涉及在大语言模型推理阶段将人类无法察觉的隐藏标记嵌入文本中。同时实现所嵌入水印的可检测性与生成文本的语义质量具有挑战性。尽管当前水印算法已在此方向取得显著进展,但仍存在巨大的改进空间。为应对这些挑战,我们提出一种新颖的多目标优化水印方法,该方法利用轻量级网络生成词元特定的水印逻辑值和分配比例。通过利用多目标优化同时优化检测目标函数与语义目标函数,我们的方法能够同步实现水印可检测性与语义完整性。实验结果表明,在保持语义一致性的同时,我们的方法在增强大语言模型生成文本的可检测性方面优于现有水印技术。我们的代码公开于https://github.com/mignonjia/TS_watermark。