ChatGPT and generative AI tools are becoming the new reality. This work is motivated by the premise that ``ChatGPT content may exhibit a distinctive behavior that can be separated from scientific articles''. In this study, we demonstrate how we tested this premise in two phases and prove its validity. Subsequently, we introduce xFakeSci, a novel learning algorithm, that is capable of distinguishing ChatGPT-generated articles from publications produced by scientists. The algorithm is trained using network models driven from multiple types of data sources, such as ChatGPT-generated documents achieved by means of prompt-engineering, and PubMed articles. To mitigate over-fitting issues, we incorporate a calibration step that is built upon data-driven heuristics, including ratios. We evaluate the algorithm across multiple datasets covering publication periods and diseases (cancer, depression, and Alzheimer's). Further, we show how the algorithm is benchmarked against the state-of-the-art (SOTA) algorithms. While the xFakeSci algorithm achieve F1 score ranging from 80% - 94%, SOTA algorithms score F1 values between 38% - 52%. We attribute the noticeable difference to the introduction of calibration and a proximity distance heuristic, which we underscore this promising performance. Indeed, the prediction of fake science generated by ChatGPT presents a considerable challenge. Nonetheless, the introduction of xFakeSci algorithm is a significant step on the way to combating fake science.
翻译:ChatGPT与生成式人工智能工具正成为新常态。本研究基于“ChatGPT内容可能展现出区别于科学文章的独特行为模式”这一假设展开。我们通过两个阶段的实验验证了该假设的有效性,并提出了一种新型学习算法xFakeSci,该算法能够区分ChatGPT生成的论文与科学家撰写的真实出版物。算法利用多源数据驱动的网络模型进行训练,包括通过提示工程获得的ChatGPT生成文档以及PubMed论文。为缓解过拟合问题,我们引入了基于数据驱动启发式方法(如比率指标)的校准步骤。我们使用涵盖不同发表时期和疾病领域(癌症、抑郁症、阿尔茨海默病)的多组数据集评估该算法性能,并与当前最优算法进行对比实验。结果显示,xFakeSci算法的F1分数达到80%-94%,而最优算法仅为38%-52%。我们将其显著优势归因于所引入的校准步骤和邻近距离启发式方法。尽管预测ChatGPT生成的虚假科学研究面临重大挑战,但xFakeSci算法的提出为对抗虚假科学迈出了关键一步。