In recent years, significant advancements have been made in the text generation capabilities of Large Language Models (LLMs), demonstrating exceptional performance in downstream tasks such as abstract summarization, dialogue generation, and data-to-text conversion. However, their generative abilities also pose risks such as the rapid spread of fake news, infringement of datasets/LLM copyrights, and challenges to academic integrity. Text watermarking technology emerges as a potential solution. By embedding invisible yet detectable patterns in generated texts, it helps in tracking and verifying text origins, thus preventing misuse and piracy. This survey aims to comprehensively summarize current text watermarking technologies, covering three main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their success rate, impact on text quality, robustness, and unforgeability; (3) potential applications of text watermarking technologys. This survey aims to help researchers thoroughly understanding the text watermarking technologies, thereby fostering further development.
翻译:近年来,大语言模型(LLMs)在文本生成能力方面取得了显著进展,在摘要生成、对话生成及数据到文本转换等下游任务中展现了卓越性能。然而,其生成能力也带来了虚假新闻快速传播、数据集/大语言模型版权侵权及学术诚信受损等风险。文本水印技术应运而生,通过在生成文本中嵌入不可见但可检测的模式,帮助追踪和验证文本来源,从而防止滥用和盗版。本文旨在全面综述当前文本水印技术,涵盖三个主要方面:(1)不同文本水印技术的概述与比较;(2)文本水印算法的评估方法,包括成功率、对文本质量的影响、鲁棒性和不可伪造性;(3)文本水印技术的潜在应用。本综述旨在帮助研究人员深入理解文本水印技术,从而推动其进一步发展。