Amidst growing concerns of large language models (LLMs) being misused for generating misinformation or completing homework assignments, watermarking has emerged as an effective solution for distinguishing human-written and LLM-generated text. A prominent watermarking strategy is to embed a signal into generated text by upsampling a (pseudorandomly-chosen) subset of tokens at every generation step. Although this signal is imperceptible to a human reader, it is detectable through statistical testing. However, implanting such signals alters the model's output distribution and can have unintended effects when watermarked LLMs are used for downstream applications. In this work, we evaluate the performance of watermarked LLMs on a diverse suite of tasks, including text classification, textual entailment, reasoning, question answering, translation, summarization, and language modeling. We find that watermarking has negligible impact on the performance of tasks posed as k-class classification problems in the average case. However, the accuracy can plummet to that of a random classifier for some scenarios (that occur with non-negligible probability). Tasks that are cast as multiple-choice questions and short-form generation are surprisingly unaffected by watermarking. For long-form generation tasks, including summarization and translation, we see a drop of 15-20% in the performance due to watermarking. Our findings highlight the trade-offs that users should be cognizant of when using watermarked models, and point to cases where future research could improve existing trade-offs.
翻译:随着大型语言模型被滥用于生成虚假信息或完成作业的担忧日益加剧,水印技术已成为区分人类撰写文本与LLM生成文本的有效解决方案。一种主流的水印策略是在每个生成步骤中通过上采样(伪随机选择的)token子集,将信号嵌入生成文本。尽管这种信号对人类读者而言难以察觉,但可通过统计检验进行检测。然而,植入此类信号会改变模型的输出分布,并在水印LLM用于下游应用时产生非预期影响。本研究评估了水印LLM在多项任务中的性能,包括文本分类、文本蕴含、推理、问答、翻译、摘要和语言建模。研究发现,对于平均情况下的k类分类任务,水印对性能的影响可忽略不计。但在某些以不可忽略概率出现的场景中,准确率可骤降至随机分类器水平。令人意外的是,多项选择题和短文本生成任务几乎不受水印影响。对于包括摘要和翻译在内的长文本生成任务,水印导致性能下降15-20%。我们的发现凸显了用户在使用水印模型时需认知的性能权衡,并指出了未来研究可改进现有权衡的关键方向。