Language models now routinely produce text that is difficult to distinguish from human writing, raising the need for robust tools to verify content provenance. Watermarking has emerged as a promising countermeasure, with existing work largely focused on model quality preservation and robust detection. However, current schemes provide limited protection against false attribution. We strengthen the notion of soundness by introducing two novel guarantees: unforgeability and recoverability. Unforgeability prevents adversaries from crafting false positives, texts that are far from any output from the watermarked model but are nonetheless flagged as watermarked. Recoverability provides an additional layer of protection: whenever a watermark is detected, the detector identifies the source text from which the flagged content was derived. Together, these properties strengthen content ownership by linking content exclusively to its generating model, enabling secure attribution and fine-grained traceability. We construct the first undetectable watermarking scheme that is robust, unforgeable, and recoverable with respect to substitutions (i.e., perturbations in Hamming metric). The key technical ingredient is a new cryptographic primitive called robust (or recoverable) digital signatures, which allow verification of messages that are close to signed ones, while preventing forgery of messages that are far from all previously signed messages. We show that any standard digital signature scheme can be boosted to a robust one using property-preserving hash functions (Boyle, LaVigne, and Vaikuntanathan, ITCS 2019).
翻译:随着语言模型生成的文本已难以与人类写作区分,亟需可靠工具验证内容来源。水印技术已成为一种有效的应对手段,现有研究主要关注模型质量保持与鲁棒检测。然而,当前方案对错误归属的防护能力有限。本文通过引入两个新的安全保证——不可伪造性与可恢复性,强化了水印的可靠性概念。不可伪造性防止攻击者构造误报,即生成与带水印模型输出相距甚远却被标记为带水印的文本。可恢复性提供额外保护层:当检测到水印时,检测器能识别被标记内容所源自的源文本。这些特性通过将内容与其生成模型唯一关联,强化了内容所有权,实现了安全归属与细粒度溯源。我们构建了首个针对替换操作(即汉明度量下的扰动)具有鲁棒性、不可伪造性与可恢复性的不可检测水印方案。关键技术要素是一种称为鲁棒(或可恢复)数字签名的新型密码学原语,该原语允许验证与已签名消息相近的消息,同时防止伪造与所有已签名消息相距甚远的新消息。我们证明,利用属性保持哈希函数(Boyle, LaVigne与Vaikuntanathan, ITCS 2019),任何标准数字签名方案均可升级为鲁棒签名方案。