Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.
翻译:叙事摘要旨在生成叙事的精简版本,以描述其最显著的事件和角色。对叙事进行摘要具有挑战性,因为它需要理解事件因果关系和角色行为。为鼓励该方向的研究,我们提出了NarraSum,一个大规模叙事摘要数据集。该数据集包含122K篇叙事文档(来自电影和电视剧集的情节描述,涵盖多种类型)及其对应的抽象式摘要。实验表明,在NarraSum上,人类与最先进的摘要模型之间存在巨大性能差距。我们希望该数据集能推动未来摘要研究,以及更广泛的自然语言理解与生成研究。该数据集可从https://github.com/zhaochaocs/narrasum获取。