Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies between the distribution of machine-generated text and the distribution of human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.
翻译:大语言模型(LLMs)显著提升了机器生成文本的流畅性与多样性。然而,这一进步也带来了检测文本来源的重大挑战,当前检测方法的研究滞后于LLMs的快速发展。传统的基于训练的方法在适应新领域时存在灵活性不足的问题,且往往缺乏可解释性。为弥补这一不足,我们提出了一种名为发散N-gram分析(DNA-GPT)的新型无训练检测策略。给定一段文本,我们首先从中间截断,仅将前半部分作为LLMs的输入以生成新的剩余部分。通过黑盒N-gram分析或白盒概率差异来比较原始剩余部分与新生成部分的差异,我们揭示了机器生成文本分布与人类撰写文本分布之间的显著差异。我们在OpenAI最先进的LLMs(包括text-davinci-003、GPT-3.5-turbo和GPT-4)以及GPT-NeoX-20B和LLaMa-13B等开源模型上进行了广泛实验。结果表明,我们的零样本方法在四个英语数据集和一个德语数据集上区分人类与GPT生成文本时展现出最先进的性能,超越了在数百万文本上训练的OpenAI自身分类器。此外,我们的方法提供了合理的解释和证据来支持结论,这是可解释性检测的独特特征。该方法对文本改写攻击具有鲁棒性,并可额外解决模型溯源问题。代码详见https://github.com/Xianjun-Yang/DNA-GPT。