Extensive efforts in the past have been directed toward the development of summarization datasets. However, a predominant number of these resources have been (semi)-automatically generated, typically through web data crawling, resulting in subpar resources for training and evaluating summarization systems, a quality compromise that is arguably due to the substantial costs associated with generating ground-truth summaries, particularly for diverse languages and specialized domains. To address this issue, we present ACLSum, a novel summarization dataset carefully crafted and evaluated by domain experts. In contrast to previous datasets, ACLSum facilitates multi-aspect summarization of scientific papers, covering challenges, approaches, and outcomes in depth. Through extensive experiments, we evaluate the quality of our resource and the performance of models based on pretrained language models and state-of-the-art large language models (LLMs). Additionally, we explore the effectiveness of extractive versus abstractive summarization within the scholarly domain on the basis of automatically discovered aspects. Our results corroborate previous findings in the general domain and indicate the general superiority of end-to-end aspect-based summarization. Our data is released at https://github.com/sobamchan/aclsum.
翻译:过去的研究在摘要数据集的开发上投入了大量精力。然而,这些资源中的绝大多数是通过(半)自动方式生成的,通常依赖于网络数据爬取,导致用于训练和评估摘要系统的资源质量不佳——这种质量折衷的原因在于生成真实摘要的高昂成本,尤其是针对多语言和专门领域。为解决这一问题,我们提出了ACLSum,一个由领域专家精心构建和评估的新型摘要数据集。与以往数据集不同,ACLSum支持对科学论文进行多方面摘要,深入覆盖挑战、方法和成果。通过大量实验,我们评估了该数据集的质量,以及基于预训练语言模型和最新大型语言模型(LLMs)的模型的性能。此外,我们基于自动发现的方面,探讨了学术领域中抽取式与生成式摘要的有效性。实验结果证实了此前在通用领域的发现,并表明端到端方面摘要的总体优越性。我们的数据已发布在https://github.com/sobamchan/aclsum。