In recent years, significant advancements have been made in the text generation capabilities of Large Language Models (LLMs), demonstrating exceptional performance in downstream tasks such as abstract summarization, dialogue generation, and data-to-text conversion. However, their generative abilities also pose risks such as the rapid spread of fake news, infringement of datasets/LLM copyrights, and challenges to academic integrity. Text watermarking technology emerges as a potential solution. By embedding invisible yet detectable patterns in generated texts, it helps in tracking and verifying text origins, thus preventing misuse and piracy. This survey aims to comprehensively summarize current text watermarking technologies, covering three main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their success rate, impact on text quality, robustness, and unforgeability; (3) potential applications of text watermarking technologies. This survey aims to help researchers thoroughly understanding the text watermarking technologies, thereby fostering further development.
翻译:近年来,大型语言模型(LLMs)在文本生成能力方面取得了显著进展,在下游任务(如摘要生成、对话生成以及数据到文本转换)中表现出卓越的性能。然而,其生成能力也带来诸多风险,例如虚假新闻的快速传播、数据集/LLM版权的侵犯以及对学术诚信的挑战。文本水印技术作为一项潜在解决方案应运而生。通过在生成的文本中嵌入隐形但可检测的模式,该技术有助于追踪和验证文本来源,从而防止滥用和盗版。本综述旨在全面总结当前的文本水印技术,涵盖三个主要方面:(1)不同文本水印技术的概述与比较;(2)文本水印算法的评估方法,包括成功率、对文本质量的影响、鲁棒性以及不可伪造性;(3)文本水印技术的潜在应用。本综述旨在帮助研究者深入理解文本水印技术,从而推动其进一步发展。