Detection of ChatGPT Fake Science with the xFakeBibs Learning Algorithm

ChatGPT is becoming a new reality. In this paper, we demonstrate a method for distinguishing ChatGPT-generated publications from those produced by scientists. The objective of this work is to introduce a newly designed supervised network-driven algorithm that illustrates how to predict machine-generated content. The premise is that ChatGPT content exhibits behavior that is distinctive and can be set apart from scientific articles. The algorithm was trained and tested on three disease-specific publications, with each model constructed from 100 abstracts. Additionally, the algorithm underwent k-Folds calibration (depending on the availability of the data) to establish a lower-upper bound range of acceptance. The network training model of ChatGPT showed a lower number of nodes and a higher number of edges when compared with models of real article abstracts. The algorithm was executed in single-mode to predict the class of one type of dataset at a time and achieved >94%. It was also executed in multi-mode on mixed documents of ChatGPT and PubMed abstracts. The algorithm remarkably predicted real articles with a precision of 100% and, on rare occasions, 96%-98%. However, ChatGPT content was often misclassified as real publications with up to 88% accuracy in all datasets of the three diseases. Our results also showed that the year of publications mixed with ChatGPT-generated content may play a factor in detecting the correct class, where the older the publication, the better the prediction.

翻译：ChatGPT正在成为一种新现实。本文展示了一种区分ChatGPT生成出版物与科学家撰写出版物的方法。本研究的目的是引入一种新设计的监督式网络驱动算法，说明如何预测机器生成的内容。基本前提是ChatGPT内容表现出独特行为，可与科学文章区分开来。该算法在三种特定疾病的出版物上进行了训练和测试，每个模型由100篇摘要构建而成。此外，算法还经过了k折校准（取决于数据可用性），以建立接受范围的上下界。与真实文章摘要的模型相比，ChatGPT的网络训练模型显示出更少的节点数和更多的边数。该算法以单模式执行，每次预测一类数据集，准确率超过94%；同时以多模式对ChatGPT和PubMed摘要的混合文档执行。该算法惊人地以100%的精确率预测真实文章，偶尔达到96%-98%。然而，在所有三种疾病的数据集中，ChatGPT内容被误分类为真实出版物的准确率高达88%。我们的结果还表明，包含ChatGPT生成内容的出版年份可能影响正确类别的检测，出版物越旧，预测效果越好。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日