Google's SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. The system's innovation lies in: 1) a new Tournament sampling algorithm for watermarking embedding, 2) a detection strategy based on the introduced score function (e.g., Bayesian or mean score), and 3) a unified design that supports both distortionary and non-distortionary watermarking methods. This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. For example, we prove that the mean score is inherently vulnerable to increased tournament layers, and design a layer inflation attack to break SynthID-Text. We also prove the Bayesian score offers improved watermark robustness w.r.t. layers and further establish that the optimal Bernoulli distribution for watermark detection is achieved when the parameter is set to 0.5. Together, these theoretical and empirical insights not only deepen our understanding of SynthID-Text, but also open new avenues for analyzing effective watermark removal strategies and designing robust watermarking techniques. Source code is available at https: //github.com/romidi80/Synth-ID-Empirical-Analysis.
翻译:谷歌的SynthID-Text是首个面向大语言模型的生产就绪型生成式水印系统,其设计了一种新颖的基于锦标赛的方法,在识别AI生成文本的检测能力上达到了最先进水平。该系统的创新之处在于:1)一种用于水印嵌入的新型锦标赛采样算法;2)基于引入的评分函数(例如贝叶斯评分或均值评分)的检测策略;3)一种支持失真与非失真水印方法的统一设计。本文首次对SynthID-Text进行了理论分析,重点关注其检测性能与水印鲁棒性,并辅以实证验证。例如,我们证明了均值评分本质上容易受到锦标赛层数增加的影响,并设计了一种层膨胀攻击来破解SynthID-Text。我们还证明了贝叶斯评分在层数方面提供了更高的水印鲁棒性,并进一步确立了当参数设置为0.5时,用于水印检测的最优伯努利分布得以实现。这些理论与实证见解不仅加深了我们对SynthID-Text的理解,也为分析有效的水印去除策略和设计鲁棒的水印技术开辟了新途径。源代码可在 https://github.com/romidi80/Synth-ID-Empirical-Analysis 获取。