Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing in various fields, including academic research. The rapid proliferation of generated texts has bolstered the need for automatic identification to uphold transparency and trust in the information. However, these generated texts closely resemble human writing and often have subtle differences in the grammatical structure, tones, and patterns, which makes systematic scrutinization challenging. In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded. We extract the texts semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.
翻译:生成式语言模型(如ChatGPT)因其在包括学术研究在内的多个领域生成类人文本的能力而备受关注。生成文本的快速扩散加剧了对自动识别方法的需求,以维护信息的透明度与可信度。然而,这些生成文本与人类写作高度相似,且在语法结构、语气和模式上往往存在细微差异,使得系统化审查具有挑战性。在本工作中,我们尝试检测由ChatGPT生成的摘要——这类文本长度较短且受限于特定格式。通过提取文本的语义与词汇特征,我们观察到传统机器学习模型能够可靠地识别这些摘要。