A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i.e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i.e., reported statements). However, existing work on news summarization almost exclusively focuses on the event details. In this work, we propose the novel task of summarizing the reactions of different speakers, as expressed by their reported statements, to a given event. To this end, we create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
翻译:新闻文章的主要目标之一是建立事件的事实记录,这通常通过传达特定事件的细节(即事件的5W:谁、什么、何时、何地及为何)以及人们对事件的反应(即转述言论)来实现。然而,现有关于新闻摘要的研究几乎完全聚焦于事件细节。本文提出了一个新的任务:总结不同说话者通过其转述言论对给定事件的反应。为此,我们构建了一个新的多文档摘要基准SUMREN,包含从633篇报道132个事件的新闻文章中提取的745份针对各类公众人物转述言论的摘要。我们提出了一种自动化的银训练数据生成方法,帮助BART等较小模型在该任务上达到GPT-3级别的性能。最后,我们引入了一个基于流水线的转述言论摘要框架,实证表明其生成的摘要相比基线查询式摘要方法更具抽象性和事实性。