Large Language Models (LLMs) have gathered significant attention due to their impressive performance on a variety of tasks. ChatGPT, developed by OpenAI, is a recent addition to the family of language models and is being called a disruptive technology by a few, owing to its human-like text-generation capabilities. Although, many anecdotal examples across the internet have evaluated ChatGPT's strength and weakness, only a few systematic research studies exist. To contribute to the body of literature of systematic research on ChatGPT, we evaluate the performance of ChatGPT on Abstractive Summarization by the means of automated metrics and blinded human reviewers. We also build automatic text classifiers to detect ChatGPT generated summaries. We found that while text classification algorithms can distinguish between real and generated summaries, humans are unable to distinguish between real summaries and those produced by ChatGPT.
翻译:大型语言模型(LLMs)因其在多种任务上的出色表现而备受关注。OpenAI开发的ChatGPT是语言模型家族的最新成员,因其类人文本生成能力被一些人称为颠覆性技术。尽管互联网上大量轶事性案例评估了ChatGPT的优势与不足,但仅有少数系统性研究存在。为了丰富关于ChatGPT的系统性研究文献,我们通过自动化指标和盲审人工评估,评估ChatGPT在抽象摘要生成任务上的表现。同时,我们构建了自动文本分类器以检测ChatGPT生成的摘要。研究发现,虽然文本分类算法能够区分真实摘要与生成摘要,但人类无法辨别真实摘要与ChatGPT生成的摘要。