A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key -- provided by the LLM to the verifier -- to enable controlling the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks -- one of which has been internally implemented at OpenAI -- and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.

翻译：自2022年11月ChatGPT问世以来，将（近乎）不可察觉的统计信号嵌入大型语言模型（LLMs）生成的文本中——即水印技术——已成为一种可证明检测LLM生成文本与人类撰写文本的原则性方法。本文提出一个通用且灵活的框架，用于推理水印的统计效率并设计强大的检测规则。受水印检测假设检验表述的启发，该框架首先选择文本的枢轴统计量和一个由LLM提供给验证者的密钥，以实现对误报率（将人类撰写文本误判为LLM生成文本的错误）的控制。随后，该框架通过获得渐近漏报率（将LLM生成文本错误归类为人类撰写文本的错误）的闭式表达式，使评估水印检测规则的效能成为可能。我们的框架进一步将确定最优检测规则的问题简化为求解一个极小极大优化问题。我们将该框架应用于两种代表性水印方案——其中一种已在OpenAI内部实施——并获得了多项对指导水印实践具有重要价值的发现。特别地，我们推导出这些水印方案在本框架下的最优检测规则。数值实验表明，这些理论推导的检测规则具有竞争力，有时甚至比现有检测方法具有更高的检测效能。