Towards Codable Text Watermarking for Large Language Models

As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs. Text watermarking techniques have proven reliable in distinguishing whether a text is generated by LLMs by injecting hidden patterns into the generated texts. However, we argue that existing watermarking methods for LLMs are encoding-inefficient (only contain one bit of information - whether it is generated from an LLM or not) and cannot flexibly meet the diverse information encoding needs (such as encoding model version, generation time, user id, etc.) in different LLMs application scenarios. In this work, we conduct the first systematic study on the topic of Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry more customizable information. First of all, we study the taxonomy of LLM watermarking technology and give a mathematical formulation for CTWL. Additionally, we provide a comprehensive evaluation system for CTWL: (1) watermarking success rate, (2) robustness against various corruptions, (3) coding rate of payload information, (4) encoding and decoding efficiency, (5) impacts on the quality of the generated text. To meet the requirements of these non-Pareto-improving metrics, we devise a CTWL method named Balance-Marking, based on the motivation of ensuring that available and unavailable vocabularies for encoding information have approximately equivalent probabilities. Compared to the random vocabulary partitioning extended from the existing work, a probability-balanced vocabulary partition can significantly improve the quality of the generated text. Extensive experimental results have shown that our method outperforms a direct baseline under comprehensive evaluation.

翻译：随着大语言模型（LLMs）生成的文本日益流畅且逼真，识别文本来源以防止LLMs滥用的需求日益迫切。文本水印技术通过向生成文本中注入隐藏模式，已被证明能可靠地区分文本是否由LLMs生成。然而，我们认为现有LLMs水印方法存在编码效率低下（仅包含一个比特信息——即是否由LLM生成）的问题，无法灵活满足不同LLMs应用场景中多样的信息编码需求（如编码模型版本、生成时间、用户ID等）。本研究首次系统性地探讨了面向大语言模型的可编码文本水印（CTWL）课题，该技术使文本水印能够携带更多可定制信息。首先，我们研究了LLM水印技术的分类体系，并给出了CTWL的数学形式化定义。此外，我们构建了CTWL的全面评估体系：（1）水印成功率，（2）对各类篡改的鲁棒性，（3）有效载荷信息的编码率，（4）编码与解码效率，（5）对生成文本质量的影响。为满足这些非帕累托优化指标的约束，我们基于确保可用于编码信息的词汇与不可用词汇具有近似相等概率的动机，设计了一种名为Balance-Marking的CTWL方法。与现有工作中扩展的随机词汇划分相比，概率均衡的词汇划分能显著提升生成文本质量。大量实验结果表明，在全面评估体系下，我们的方法优于直接基线方法。