ChatGPT is becoming a new reality. In this paper, we demonstrate a method for distinguishing ChatGPT-generated publications from those produced by scientists. The objective of this work is to introduce a newly designed supervised network-driven algorithm that illustrates how to predict machine-generated content. The premise is that ChatGPT content exhibits behavior that is distinctive and can be set apart from scientific articles. The algorithm was trained and tested on three disease-specific publications, with each model constructed from 100 abstracts. Additionally, the algorithm underwent k-Folds calibration (depending on the availability of the data) to establish a lower-upper bound range of acceptance. The network training model of ChatGPT showed a lower number of nodes and a higher number of edges when compared with models of real article abstracts. The algorithm was executed in single-mode to predict the class of one type of dataset at a time and achieved >94%. It was also executed in multi-mode on mixed documents of ChatGPT and PubMed abstracts. The algorithm remarkably predicted real articles with a precision of 100% and, on rare occasions, 96%-98%. However, ChatGPT content was often misclassified as real publications with up to 88% accuracy in all datasets of the three diseases. Our results also showed that the year of publications mixed with ChatGPT-generated content may play a factor in detecting the correct class, where the older the publication, the better the prediction.
翻译:ChatGPT正在成为一种新现实。本文展示了一种区分ChatGPT生成出版物与科学家撰写出版物的方法。本研究的目的是引入一种新设计的监督式网络驱动算法,说明如何预测机器生成的内容。基本前提是ChatGPT内容表现出独特行为,可与科学文章区分开来。该算法在三种特定疾病的出版物上进行了训练和测试,每个模型由100篇摘要构建而成。此外,算法还经过了k折校准(取决于数据可用性),以建立接受范围的上下界。与真实文章摘要的模型相比,ChatGPT的网络训练模型显示出更少的节点数和更多的边数。该算法以单模式执行,每次预测一类数据集,准确率超过94%;同时以多模式对ChatGPT和PubMed摘要的混合文档执行。该算法惊人地以100%的精确率预测真实文章,偶尔达到96%-98%。然而,在所有三种疾病的数据集中,ChatGPT内容被误分类为真实出版物的准确率高达88%。我们的结果还表明,包含ChatGPT生成内容的出版年份可能影响正确类别的检测,出版物越旧,预测效果越好。