With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
翻译:随着流畅生成式语言模型的出现,它们能生成与人类书写极为相似的令人信服的话语,区分一段文本是机器生成还是人类撰写的任务变得更具挑战性和重要性,因为此类模型可能被用于传播错误信息、虚假新闻、虚假评论,或模仿特定作者及其风格。为此,学界已提出多种检测机器生成文本的方法。多数方法需访问目标模型的逻辑值或具备从目标采样的能力。一种基于黑箱的检测方法依赖于观察:生成文本在生成器的似然函数下是局部最优的,而人类撰写的文本则不然。我们发现,总体而言,更小且部分训练完成的模型是更好的通用文本检测器:它们能更精确地检测由小型及大型模型生成的文本。有趣的是,检测器与生成器是否在同一数据上训练对检测成功并非至关重要。例如,OPT-125M模型检测ChatGPT生成文本时的AUC为0.81,而来自GPT系列的大型模型GPTJ-6B的AUC仅为0.45。