We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparameterization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Furthermore, we introduce an optimized beam search algorithm to guarantee the coherence and consistency of the generated content. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM proficiency and transferability in inserting 2 times more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks.
翻译:本文提出REMARK-LLM,一种面向大语言模型(LLM)生成文本的高效鲁棒水印框架。LLM合成类人内容需消耗海量计算资源与数据集,其中蕴含着关键知识产权(IP)。然而,生成内容易遭受恶意利用,包括垃圾信息传播与抄袭。为应对这些挑战,REMARK-LLM提出三个新组件:(i)基于学习的消息编码模块,用于将二进制签名注入LLM生成文本;(ii)重参数化模块,将消息编码的密集分布转换为水印文本令牌的稀疏分布;(iii)专用于签名提取的解码模块。此外,我们引入优化束搜索算法以保证生成内容的连贯性与一致性。REMARK-LLM经过严格训练,既保障水印内容的语义完整性,又确保有效的水印检索。在多个未见数据集上的广泛评估表明,与现有技术相比,REMARK-LLM能在同等文本中嵌入2倍以上的签名比特,同时保持语义完整性。此外,REMARK-LLM对多种水印检测与移除攻击展现出更强的鲁棒性。