On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing

With ChatGPT under the spotlight, utilizing large language models (LLMs) to assist academic writing has drawn a significant amount of debate in the community. In this paper, we aim to present a comprehensive study of the detectability of ChatGPT-generated content within the academic literature, particularly focusing on the abstracts of scientific papers, to offer holistic support for the future development of LLM applications and policies in academia. Specifically, we first present GPABench2, a benchmarking dataset of over 2.8 million comparative samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of scientific writing in computer science, physics, and humanities and social sciences. Second, we explore the methodology for detecting ChatGPT content. We start by examining the unsatisfactory performance of existing ChatGPT detecting tools and the challenges faced by human evaluators (including more than 240 researchers or students). We then test the hand-crafted linguistic features models as a baseline and develop a deep neural framework named CheckGPT to better capture the subtle and deep semantic and linguistic patterns in ChatGPT written literature. Last, we conduct comprehensive experiments to validate the proposed CheckGPT framework in each benchmarking task over different disciplines. To evaluate the detectability of ChatGPT content, we conduct extensive experiments on the transferability, prompt engineering, and robustness of CheckGPT.

翻译：随着ChatGPT成为焦点，利用大型语言模型辅助学术写作已在学界引发广泛讨论。本文旨在对学术文献中ChatGPT生成内容的可检测性进行全面研究，特别聚焦于科学论文摘要，为学术界未来开发LLM应用与政策提供全方位支持。具体而言，我们首先构建GPABench2基准数据集，包含计算机科学、物理学及人文社会科学领域超过280万组对比样本，涵盖人类撰写、GPT生成、GPT补全及GPT润色四种形式的科学写作摘要。其次，我们探索ChatGPT内容检测的方法论：从现有检测工具性能欠佳与人类评估者（涵盖240余名研究人员或学生）面临的挑战出发，以手工构建的语言特征模型为基线，开发名为CheckGPT的深度神经网络框架，以更精准捕捉ChatGPT生成文献中隐含的深层语义与语言模式。最后，通过跨学科多基准任务的系统实验验证CheckGPT框架的有效性。为评估ChatGPT内容的可检测性，我们针对CheckGPT的可迁移性、提示工程设计及鲁棒性开展大规模实验研究。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日