With the remarkable generation performance of large language models, ethical and legal concerns about using them have been raised, such as plagiarism and copyright issues. For such concerns, several approaches to watermark and detect LLM-generated text have been proposed very recently. However, we discover that the previous methods fail to function appropriately with code generation tasks because of the syntactic and semantic characteristics of code. Based on \citet{Kirchenbauer2023watermark}, we propose a new watermarking method, Selective WatErmarking via Entropy Thresholding (SWEET), that promotes "green" tokens only at the position with high entropy of the token distribution during generation, thereby preserving the correctness of the generated code. The watermarked code is detected by the statistical test and Z-score based on the entropy information. Our experiments on HumanEval and MBPP show that SWEET significantly improves the Pareto Frontier between the code correctness and watermark detection performance. We also show that notable post-hoc detection methods (e.g. DetectGPT) fail to work well in this task. Finally, we show that setting a reasonable entropy threshold is not much of a challenge. Code is available at https://github.com/hongcheki/sweet-watermark.
翻译:鉴于大语言模型在生成任务中表现卓越,其使用引发的伦理与法律问题(如抄袭和版权争议)日益凸显。针对这类问题,近期已有数种针对大语言模型生成文本的水印标记与检测方法被提出。然而,我们发现现有方法在代码生成任务中,因代码语法与语义的特殊性而难以有效运作。基于Kirchenbauer等人(2023)的研究,我们提出了一种新型水印方法——基于熵阈值的选择性水印技术(Selective WatErmarking via Entropy Thresholding, SWEET)。该方法仅在生成过程中词元分布熵值较高的位置促进"绿色"词元的选取,从而保障生成代码的正确性。被标记水印的代码通过基于熵信息的统计检验与Z分数进行检测。我们在HumanEval与MBPP数据集上的实验表明,SWEET显著优化了代码正确性与水印检测性能间的帕累托前沿。同时,我们证实了常见的后验检测方法(如DetectGPT)在此任务中效果不佳。最后,研究表明设置合理的熵阈值并非难点。代码已开源至https://github.com/hongcheki/sweet-watermark。