Recently, text watermarking algorithms for large language models (LLMs) have been mitigating the potential harms of text generated by the LLMs, including fake news and copyright issues. However, the watermark detection of current text algorithms requires the key from the generation process, making them susceptible to breaches and counterfeiting. In this work, we propose the first private watermarking algorithm, which extends the current text watermarking algorithms by using two different neural networks respectively for watermark generation and detection, rather than using the same key at both stages. Meanwhile, part of the parameters of the watermark generation and detection networks are shared, which makes the detection network achieve a high accuracy very efficiently. Experiments show that our algorithm ensures high detection accuracy with minimal impact on generation and detection speed, due to the small parameter size of both networks. Additionally, our subsequent analysis demonstrates the difficulty of reverting the watermark generation rules from the detection network.
翻译:近期,针对大型语言模型(LLM)的文本水印算法已开始缓解由LLM生成的文本所带来的潜在危害,包括虚假新闻和版权问题。然而,现有文本算法的水印检测需要生成过程中的密钥,这使得它们容易遭受泄露和伪造。在本工作中,我们提出了首个私有水印算法,该算法通过使用两个不同的神经网络分别进行水印生成和检测(而非在生成和检测阶段使用相同的密钥),对现有文本水印算法进行了扩展。同时,水印生成网络与检测网络的部分参数实现共享,从而使检测网络能够以极高的效率实现高精度检测。实验表明,由于两个网络的参数量较小,我们的算法在确保高检测精度的同时,对生成和检测速度的影响极小。此外,后续分析证明,从检测网络逆向推导水印生成规则具有较高难度。