LLM watermarks allow tracing AI-generated texts by inserting a detectable signal into their generated content. Recent works have proposed a wide range of watermarking algorithms, each with distinct designs, usually built using a bottom-up approach. Crucially, there is no general and principled formulation for LLM watermarking. In this work, we show that most existing and widely used watermarking schemes can in fact be derived from a principled constrained optimization problem. Our formulation unifies existing watermarking methods and explicitly reveals the constraints that each method optimizes. In particular, it highlights an understudied quality-diversity-power trade-off. At the same time, our framework also provides a principled approach for designing novel watermarking schemes tailored to specific requirements. For instance, it allows us to directly use perplexity as a proxy for quality, and derive new schemes that are optimal with respect to this constraint. Our experimental evaluation validates our framework: watermarking schemes derived from a given constraint consistently maximize detection power with respect to that constraint.
翻译:大语言模型水印通过在其生成内容中嵌入可检测信号,实现对AI生成文本的溯源。近期研究提出了多种水印算法,这些算法设计各异,通常采用自底向上的构建方式。关键在于,目前尚缺乏针对大语言模型水印的通用且原则性的形式化框架。本研究表明,大多数现有且广泛使用的水印方案实际上可以从一个原则性的约束优化问题中推导得出。我们的形式化框架统一了现有的水印方法,并明确揭示了每种方法所优化的约束条件。特别地,该框架突显了一个尚未被充分研究的质量-多样性-检测效能权衡关系。同时,我们的框架也为针对特定需求设计新型水印方案提供了原则性方法。例如,它允许我们直接使用困惑度作为质量的代理指标,并推导出相对于该约束最优的新方案。实验评估验证了我们的框架:从给定约束推导出的水印方案,始终能针对该约束最大化检测效能。