Extractive summarization is a crucial task in natural language processing that aims to condense long documents into shorter versions by directly extracting sentences. The recent introduction of ChatGPT has attracted significant interest in the NLP community due to its remarkable performance on a wide range of downstream tasks. However, concerns regarding factuality and faithfulness have hindered its practical applications for summarization systems. This paper first presents a thorough evaluation of ChatGPT's performance on extractive summarization and compares it with traditional fine-tuning methods on various benchmark datasets. Our experimental analysis reveals that ChatGPT's extractive summarization performance is still inferior to existing supervised systems in terms of ROUGE scores. In addition, we explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance. Furthermore, we find that applying an extract-then-generate pipeline with ChatGPT yields significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT's capabilities for faithful text summarization tasks using two-stage approaches.
翻译:抽取式摘要是自然语言处理中的关键任务,旨在通过直接提取句子将长文档压缩为较短版本。近来ChatGPT的引入因其在广泛下游任务中的卓越表现引起了自然语言处理领域的显著关注。然而,关于事实性和忠实性的担忧限制了其在摘要系统中的实际应用。本文首先系统评估了ChatGPT在抽取式摘要上的表现,并与多种基准数据集上的传统微调方法进行对比。实验分析表明,就ROUGE评分而言,ChatGPT的抽取式摘要性能仍不及现有监督系统。此外,我们探究了上下文学习和思维链推理对其性能的增强效果。进一步研究发现,采用ChatGPT的"先抽取后生成"流水线在摘要忠实性方面比生成式基线方法有显著提升。这些发现为通过两阶段方法增强ChatGPT在忠实文本摘要任务中的能力指明了潜在方向。