Recently, text watermarking algorithms for large language models (LLMs) have been mitigating the potential harms of text generated by the LLMs, including fake news and copyright issues. However, the watermark detection of current text algorithms requires the key from the generation process, making them susceptible to breaches and counterfeiting. In this work, we propose the first private watermarking algorithm, which extends the current text watermarking algorithms by using two different neural networks respectively for watermark generation and detection, rather than using the same key at both stages. Meanwhile, part of the parameters of the watermark generation and detection networks are shared, which makes the detection network achieve a high accuracy very efficiently. Experiments show that our algorithm ensures high detection accuracy with minimal impact on generation and detection speed, due to the small parameter size of both networks. Additionally, our subsequent analysis demonstrates the difficulty of reverting the watermark generation rules from the detection network.
翻译:近期,面向大型语言模型(LLMs)的文本水印算法已开始缓解由LLMs生成的文本带来的潜在危害,包括虚假新闻与版权问题。然而,当前文本算法的水印检测依赖于生成过程中的密钥,这使得算法易受泄露和伪造攻击。本研究首次提出私有水印算法,该算法通过采用两个不同的神经网络分别用于水印生成与检测,而非在生成与检测阶段使用相同的密钥,从而扩展了现有文本水印算法。同时,水印生成网络与检测网络的部分参数实现共享,这使得检测网络能够高效达到高准确率。实验表明,由于两个网络的参数量较小,本算法在确保高检测精度的同时,对生成与检测速度的影响极小。此外,后续分析证明,从检测网络逆向还原水印生成规则具有较高难度。