The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.
翻译:大型语言模型(LLMs)的流畅性和事实性知识加剧了建立相应系统以检测文本是否由机器编写的需求。例如,学生可能使用LLMs完成书面作业,导致教师无法准确评估学生的学习情况。本文首先证明,从LLM中采样的文本往往位于该模型对数概率函数的负曲率区域。基于这一观察,我们随后定义了一个新的基于曲率的准则,用于判断一段文本是否由特定LLM生成。该方法名为DetectGPT,无需训练独立的分类器、收集真实或生成的文本数据集,也无需对生成文本显式添加水印。它仅需利用目标模型计算的对数概率以及另一个通用预训练语言模型(如T5)对文本的随机扰动。我们发现,DetectGPT在模型样本检测任务上比现有零样本方法更具区分度,尤其将针对20B参数GPT-NeoX生成的假新闻文章检测的AUROC值从最强零样本基线的0.81显著提升至DetectGPT的0.95。代码、数据及其他项目信息请参见https://ericmitchell.ai/detectgpt。