With the remarkable generation performance of large language models, ethical and legal concerns about using them have been raised, such as plagiarism and copyright issues. For such concerns, several approaches to watermark and detect LLM-generated text have been proposed very recently. However, we discover that the previous methods fail to function appropriately with code generation tasks because of the syntactic and semantic characteristics of code. Based on \citet{Kirchenbauer2023watermark}, we propose a new watermarking method, Selective WatErmarking via Entropy Thresholding (SWEET), that promotes "green" tokens only at the position with high entropy of the token distribution during generation, thereby preserving the correctness of the generated code. The watermarked code is detected by the statistical test and Z-score based on the entropy information. Our experiments on HumanEval and MBPP show that SWEET significantly improves the Pareto Frontier between the code correctness and watermark detection performance. We also show that notable post-hoc detection methods (e.g. DetectGPT) fail to work well in this task. Finally, we show that setting a reasonable entropy threshold is not much of a challenge. Code is available at https://github.com/hongcheki/sweet-watermark.
翻译:随着大语言模型展现出卓越的生成能力,其使用引发的伦理与法律问题(如剽窃和版权纠纷)日益凸显。针对此类问题,近期已有多种对LLM生成文本进行水印标注与检测的方法被提出。然而,我们发现,由于代码的句法和语义特性,现有方法在代码生成任务中难以有效运行。基于\citet{Kirchenbauer2023watermark}的工作,我们提出了一种新型水印方法——基于熵阈值的选择性水印(SWEET),该方法仅在生成过程中令牌分布熵值较高的位置激活“绿色”令牌,从而保障生成代码的正确性。水印化代码的检测则依赖于基于熵信息的统计检验与Z值计算。我们在HumanEval和MBPP数据集上的实验表明,SWEET显著提升了代码正确性与水印检测性能之间的帕累托前沿。此外,我们发现现有事后检测方法(如DetectGPT)在此任务中表现不佳。最后,我们证明设定合理的熵阈值并非难事。代码已开源至https://github.com/hongcheki/sweet-watermark。