Large Language Models (LLMs) have gathered significant attention due to their impressive performance on a variety of tasks. ChatGPT, developed by OpenAI, is a recent addition to the family of language models and is being called a disruptive technology by a few, owing to its human-like text-generation capabilities. Although, many anecdotal examples across the internet have evaluated ChatGPT's strength and weakness, only a few systematic research studies exist. To contribute to the body of literature of systematic research on ChatGPT, we evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers. We also build automatic text classifiers to detect ChatGPT generated summaries. We found that while text classification algorithms can distinguish between real and generated summaries, humans are unable to distinguish between real summaries and those produced by ChatGPT.
翻译:大型语言模型(LLMs)因其在各种任务上的卓越表现而受到广泛关注。由OpenAI开发的ChatGPT是语言模型家族中的最新成员,由于其类人文本生成能力,被一些人称为颠覆性技术。尽管互联网上流传着许多评估ChatGPT优缺点的轶事案例,但系统性的研究仍然较少。为丰富ChatGPT系统性研究的文献体系,我们通过自动化指标和盲审人工评审员评估了ChatGPT在抽象式摘要生成任务中的表现,并构建了自动文本分类器以检测ChatGPT生成的摘要。研究发现,虽然文本分类算法能够区分真实摘要与生成摘要,但人类评审员无法区分真实摘要与ChatGPT生成的摘要。