We provide a literature review about Automatic Text Summarization (ATS) systems. We consider a citation-based approach. We start with some popular and well-known papers that we have in hand about each topic we want to cover and we have tracked the "backward citations" (papers that are cited by the set of papers we knew beforehand) and the "forward citations" (newer papers that cite the set of papers we knew beforehand). In order to organize the different methods, we present the diverse approaches to ATS guided by the mechanisms they use to generate a summary. Besides presenting the methods, we also present an extensive review of the datasets available for summarization tasks and the methods used to evaluate the quality of the summaries. Finally, we present an empirical exploration of these methods using the CNN Corpus dataset that provides golden summaries for extractive and abstractive methods.
翻译:本文对自动文本摘要系统进行了文献综述。我们采用基于引用的方法,从手头已有的关于各主题的知名经典论文出发,追踪其"后向引用"(即这些论文所引用的文献)与"前向引用"(即引用这些论文的较新文献)。为系统梳理不同方法,我们根据生成摘要所采用的机制对各类自动文本摘要方法进行了归类。除介绍方法外,我们还全面综述了可用于摘要任务的公开数据集以及摘要质量评估方法。最后,我们基于CNN Corpus数据集进行了实证探索——该数据集提供了适用于抽取式与生成式方法的黄金标准摘要。