Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.

翻译：大规模语言模型（如ChatGPT）在文本生成方面的卓越能力给读者留下了深刻印象，并促使研究人员开发检测器以减轻潜在风险，包括错误信息、网络钓鱼和学术不端行为。尽管如此，以往的大多数研究主要侧重于创建区分纯ChatGPT生成文本与人类撰写文本的检测器。然而，这种方法无法有效识别通过人机协作生成的文本，例如经ChatGPT润色的文本。为弥补这一不足，我们引入了一个名为HPPT（ChatGPT润色学术摘要）的新数据集，用于构建更稳健的检测器。与现有语料库不同，该数据集包含人类撰写与ChatGPT润色摘要的配对样本，而非纯ChatGPT生成的文本。此外，我们提出了“润色比例”方法，这是一种创新性的度量方式，用于衡量ChatGPT相较于原始人类撰写文本的修改程度。该方法提供了一种机制，可量化结果文本中ChatGPT的影响程度。实验结果表明，我们提出的模型在HPPT数据集以及两个现有数据集（HC3和CDB）上具有更强的鲁棒性。此外，我们提出的“润色比例”通过量化ChatGPT的参与程度，提供了更全面的解释。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日