We provide a literature review about Automatic Text Summarization (ATS) systems. We consider a citation-based approach. We start with some popular and well-known papers that we have in hand about each topic we want to cover and we have tracked the "backward citations" (papers that are cited by the set of papers we knew beforehand) and the "forward citations" (newer papers that cite the set of papers we knew beforehand). In order to organize the different methods, we present the diverse approaches to ATS guided by the mechanisms they use to generate a summary. Besides presenting the methods, we also present an extensive review of the datasets available for summarization tasks and the methods used to evaluate the quality of the summaries. Finally, we present an empirical exploration of these methods using the CNN Corpus dataset that provides golden summaries for extractive and abstractive methods.
翻译:本文对自动文本摘要(ATS)系统进行了文献综述。我们采用基于引用的方法展开研究。首先以各主题领域内已掌握的多篇知名经典论文为起点,通过追踪这些文献的"后向引用"(即该组文献所引用的早期论文)与"前向引用"(即引用该组文献的后续新作)来系统梳理文献脉络。为便于方法分类,我们依据各方法生成摘要所采用的核心机制,对自动文本摘要的不同实现路径进行系统阐述。除方法综述外,本文还全面梳理了适用于摘要任务的各类数据集,以及评估摘要质量的方法体系。最后,我们利用包含抽取式与生成式摘要黄金标准的CNN语料库数据集,对这些方法开展了实证性探索研究。