Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}.
翻译:近期,针对大型语言模型(LLMs)的文本水印算法被提出,以缓解其生成文本可能带来的虚假新闻和版权问题等潜在危害。然而,现有水印检测算法在水印生成过程中使用的密钥在公开检测阶段易遭受安全漏洞和伪造攻击。为解决这一局限,本文提出一种名为UPV的不可伪造公开可验证水印算法:该算法采用两个不同的神经网络分别进行水印生成与检测,而非在两个阶段使用同一密钥。同时,生成网络与检测网络共享词元嵌入参数,使检测网络能够高效实现高精度。实验表明,我们的算法通过神经网络实现了检测精度与计算效率的双重提升。后续分析证实,通过检测网络伪造水印具有极高的复杂性。我们的代码开源在\href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}。此外,该算法也可通过MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}访问。