ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.

翻译：ChatGPT能够针对来自不同领域的各类问题生成语法完美且看似人类的回答，其用户数量与应用规模正以前所未有的速度增长。然而，使用与滥用相伴而生。本文研究机器学习模型能否被有效训练以准确区分原始人类文本与看似人类（即ChatGPT生成）的文本，尤其当文本较短时。此外，我们采用可解释人工智能框架来洞察经过训练的模型在区分ChatGPT生成文本与人类生成文本时的推理过程，旨在分析模型决策并识别是否存在特定模式或特征。本研究聚焦于短文本在线评论，通过两项实验比较人类生成文本与ChatGPT生成文本：第一项实验使用基于自定义查询生成的ChatGPT文本，第二项实验则通过改写原始人类评论产生文本。我们微调了基于Transformer的模型进行预测，并利用SHAP方法对预测结果进行解释。将本模型与基于困惑度分数的方法对比后发现，机器学习模型在处理改写文本时，区分人类与ChatGPT生成评论的难度更高。尽管如此，我们提出的方法仍达到79%的准确率。通过可解释性分析，我们观察到ChatGPT的写作风格具有礼貌性、缺乏具体细节、使用华丽且非典型词汇、呈现非人格化特征，且通常不表达情感。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

128+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日