SeqXGPT: Sentence-Level AI-Generated Text Detection

Widely applied large language models (LLMs) can generate human-like content, raising concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text (AIGT) detectors. Current works only consider document-level AIGT detection, therefore, in this paper, we first introduce a sentence-level detection challenge by synthesizing a dataset that contains documents that are polished with LLMs, that is, the documents contain sentences written by humans and sentences modified by LLMs. Then we propose \textbf{Seq}uence \textbf{X} (Check) \textbf{GPT}, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection. These features are composed like \textit{waves} in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. We test it in both sentence and document-level detection challenges. Experimental results show that previous methods struggle in solving sentence-level AIGT detection, while our method not only significantly surpasses baseline methods in both sentence and document-level detection challenges but also exhibits strong generalization capabilities.

翻译：广泛使用的大型语言模型（LLM）能够生成类人内容，引发了对LLM滥用的担忧。因此，构建强大的AI生成文本（AIGT）检测器至关重要。现有工作仅考虑文档级别的AIGT检测，本文首次通过合成一个包含经LLM润色文档的数据集（即文档中既包含人类撰写的句子，也包含由LLM修改的句子），引入了句子级别的检测挑战。随后，我们提出序列X（检查）GPT（SeqXGPT），一种利用白盒LLM的对数概率列表作为句子级AIGT检测特征的新方法。这些特征在语音处理中如同“波形”般组成，无法被LLM自身学习，因此我们基于卷积和自注意力网络构建了SeqXGPT。我们在句子和文档级别的检测挑战中均对其进行了测试。实验结果表明，以往方法在解决句子级AIGT检测时存在困难，而我们的方法不仅在句子和文档级检测挑战中均显著优于基线方法，还展现出强大的泛化能力。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日