Large Language Models (LLMs) have gathered significant attention due to their impressive performance on a variety of tasks. ChatGPT, developed by OpenAI, is a recent addition to the family of language models and is being called a disruptive technology by a few, owing to its human-like text-generation capabilities. Although, many anecdotal examples across the internet have evaluated ChatGPT's strength and weakness, only a few systematic research studies exist. To contribute to the body of literature of systematic research on ChatGPT, we evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers. We also build automatic text classifiers to detect ChatGPT generated summaries. We found that while text classification algorithms can distinguish between real and generated summaries, humans are unable to distinguish between real summaries and those produced by ChatGPT.
翻译:大型语言模型(LLMs)因在各种任务中的卓越表现而备受关注。由OpenAI开发的ChatGPT是语言模型家族的最新成员,因其类人文本生成能力被部分学者称为颠覆性技术。尽管互联网上已有众多实例评估ChatGPT的优势与不足,但系统的学术研究仍较为有限。为补充ChatGPT系统性研究的文献体系,我们通过自动化指标与盲审人工评估相结合的方式,评估了ChatGPT在抽象摘要生成任务中的表现。同时,我们构建了自动文本分类器以检测ChatGPT生成的摘要。研究发现:虽然文本分类算法能够有效区分真实摘要与生成摘要,但人类评审员无法辨别两者差异。