ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.
翻译:ChatGPT能够针对来自不同领域的各类问题生成语法完美且看似人类的回答,其用户数量与应用规模正以前所未有的速度增长。然而,使用与滥用相伴而生。本文研究机器学习模型能否被有效训练以准确区分原始人类文本与看似人类(即ChatGPT生成)的文本,尤其当文本较短时。此外,我们采用可解释人工智能框架来洞察经过训练的模型在区分ChatGPT生成文本与人类生成文本时的推理过程,旨在分析模型决策并识别是否存在特定模式或特征。本研究聚焦于短文本在线评论,通过两项实验比较人类生成文本与ChatGPT生成文本:第一项实验使用基于自定义查询生成的ChatGPT文本,第二项实验则通过改写原始人类评论产生文本。我们微调了基于Transformer的模型进行预测,并利用SHAP方法对预测结果进行解释。将本模型与基于困惑度分数的方法对比后发现,机器学习模型在处理改写文本时,区分人类与ChatGPT生成评论的难度更高。尽管如此,我们提出的方法仍达到79%的准确率。通过可解释性分析,我们观察到ChatGPT的写作风格具有礼貌性、缺乏具体细节、使用华丽且非典型词汇、呈现非人格化特征,且通常不表达情感。