The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10$^{\text{-6}}$). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.
翻译:区分生成文本与自然文本的任务日益具有挑战性。在此背景下,水印技术作为一种将生成文本归因于特定模型的有前景方法应运而生。它通过改变采样生成过程,在生成的输出中留下隐形痕迹,从而便于后续检测。本研究基于理论和实证两个层面的考量,从三个方面巩固了大型语言模型的水印技术。首先,我们引入了新的统计检验方法,这些方法提供了稳健的理论保证,即使在极低的误报率(低于10$^{\text{-6}}$)下依然有效。其次,我们利用自然语言处理领域的经典基准,比较了不同水印技术的有效性,从而深入了解其在实际应用中的可行性。第三,我们开发了针对可访问大型语言模型场景的高级检测方案,并实现了多位水印技术。