The potential misuse of ChatGPT and other Large Language Models (LLMs) has raised concerns regarding the dissemination of false information, plagiarism, academic dishonesty, and fraudulent activities. Consequently, distinguishing between AI-generated and human-generated content has emerged as an intriguing research topic. However, current text detection methods lack precision and are often restricted to specific tasks or domains, making them inadequate for identifying content generated by ChatGPT. In this paper, we propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content. Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods. DEMASQ is an energy-based detection model that incorporates novel aspects, such as (i) optimization inspired by the Doppler effect to capture the interdependence between input text embeddings and output labels, and (ii) the use of explainable AI techniques to generate diverse perturbations. To evaluate our detector, we create a benchmark dataset comprising a mixture of prompts from both ChatGPT and humans, encompassing domains such as medical, open Q&A, finance, wiki, and Reddit. Our evaluation demonstrates that DEMASQ achieves high accuracy in identifying content generated by ChatGPT.
翻译:ChatGPT及其他大型语言模型(LLMs)的潜在滥用引发了关于虚假信息传播、抄袭、学术不端及欺诈行为的担忧。因此,区分AI生成内容与人类生成内容已成为一个引人关注的研究课题。然而,当前的文本检测方法缺乏精度,且通常局限于特定任务或领域,难以有效识别ChatGPT生成的内容。本文提出了一种名为DEMASQ的高效ChatGPT检测器,能够准确识别ChatGPT生成的内容。我们的方法针对两个关键因素:(i)人类与机器生成内容在文本构成上的显著偏差,以及(ii)人类为规避先前检测方法而进行的修改。DEMASQ是一种基于能量的检测模型,融合了创新特性,例如:(i)受多普勒效应启发的优化机制,用于捕捉输入文本嵌入与输出标签之间的相互依赖性;(ii)采用可解释AI技术生成多样化扰动。为评估我们的检测器,我们构建了一个基准数据集,其中包含来自ChatGPT和人类的混合提示,涵盖医学、开放式问答、金融、维基及Reddit等多个领域。评估结果表明,DEMASQ在识别ChatGPT生成内容方面实现了高准确率。