The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated on a limited set of pre-defined aspects, focused solely on single document inputs, or relied on synthetic data. To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization. This benchmark is created using a novel and cost-effective annotation protocol, by which an open aspect dataset is derived from existing generic multi-document summarization datasets. We analyze the properties of OpenAsp showcasing its high-quality content. Further, we show that the realistic open-aspect setting realized in OpenAsp poses a challenge for current state-of-the-art summarization models, as well as for large language models.
翻译:近年来,自动摘要模型的性能取得了显著提升。然而,在现实场景中满足用户的特定信息需求仍存在差距,特别是当需要目标性摘要时——例如本文聚焦的有用方面摘要设定。以往针对该设定的数据集与研究主要集中于有限数量的预定义方面、仅针对单文档输入或依赖合成数据。为推进更现实场景的研究,我们提出了OpenAsp——一个面向多文档"开放"方面摘要的基准数据集。该基准采用新颖且经济高效的标注协议构建,通过该协议可从现有多文档通用摘要数据集中衍生出开放方面数据集。我们分析了OpenAsp的特性,展示了其高质量内容。此外,实验表明OpenAsp实现的开放方面设定对当前最先进的摘要模型及大型语言模型均构成挑战。