Recently, generative AIs like ChatGPT have become available to the wide public. These tools can for instance be used by students to generate essays or whole theses. But how does a teacher know whether a text is written by a student or an AI? In our work, we explore traditional and new features to (1) detect text generated by AI from scratch and (2) text rephrased by AI. Since we found that classification is more difficult when the AI has been instructed to create the text in a way that a human would not recognize that it was generated by an AI, we also investigate this more advanced case. For our experiments, we produced a new text corpus covering 10 school topics. Our best systems to classify basic and advanced human-generated/AI-generated texts have F1-scores of over 96%. Our best systems for classifying basic and advanced human-generated/AI-rephrased texts have F1-scores of more than 78%. The systems use a combination of perplexity, semantic, list lookup, error-based, readability, AI feedback, and text vector features. Our results show that the new features substantially help to improve the performance of many classifiers. Our best basic text rephrasing detection system even outperforms GPTZero by 183.8% relative in F1-score.
翻译:近期,像ChatGPT这样的生成式AI已向大众开放。这些工具可被学生用来生成论文甚至整篇学位论文。但教师如何判断文本是学生撰写还是AI生成?本研究探索了传统与新特征,用于:(1)检测AI从头生成的文本;(2)检测AI改写文本。研究发现,当AI被指令以“人类无法识别其由AI生成”的方式创作文本时,分类难度显著增加,因此我们进一步研究了这一高级案例。实验中,我们构建了一个涵盖10个学校主题的新文本语料库。针对基础与高级人类/AI生成文本的最佳分类系统F1分数超过96%;针对基础与高级人类/AI改写文本的最佳分类系统F1分数超过78%。这些系统综合运用了困惑度、语义、列表查找、错误特征、可读性、AI反馈及文本向量特征。结果表明,新增特征有效提升了多数分类器的性能。其中,最优基础改写文本检测系统在F1分数上相对GPTZero提升了183.8%。