Recent Transformer-based summarization models have provided a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is not quite efficient, which restricts model performance. In this paper, we explore two orthogonal ways to compensate for these pitfalls. First, we augment the Transformer network with a sentence cross-attention module in the decoder, encouraging more abstraction of salient content. Second, we include a curriculum learning approach to reweight the training samples, bringing about an efficient learning procedure. Our second approach to enhance the training strategy of Transformers networks makes stronger gains as compared to the first approach. We apply our model on extreme summarization dataset of Reddit TIFU posts. We further look into three cross-domain summarization datasets (Webis-TLDR-17, CNN/DM, and XSum), measuring the efficacy of curriculum learning when applied in summarization. Moreover, a human evaluation is conducted to show the efficacy of the proposed method in terms of qualitative criteria, namely, fluency, informativeness, and overall quality.
翻译:基于Transformer的近期摘要模型为抽象式摘要提供了一种有前景的方法。这些模型超越了句子选择和抽取式策略,以处理更复杂的任务,如新词生成和句子改写。然而,这些模型存在两个缺陷:(1)它们在内容选择方面表现不佳,(2)其训练策略效率不高,限制了模型性能。本文探讨了两种正交方法来弥补这些不足。首先,我们在解码器中为Transformer网络增加了一个句子交叉注意力模块,以促进对显著内容的更抽象化处理。其次,我们引入课程学习方法对训练样本进行重新加权,从而实现高效的学习过程。第二种增强Transformer网络训练策略的方法相比第一种方法取得了更显著的改进。我们将模型应用于Reddit TIFU帖子的极端摘要数据集,并进一步研究了三个跨领域摘要数据集(Webis-TLDR-17、CNN/DM和XSum),衡量了课程学习在摘要任务中的应用效果。此外,我们进行了人工评估,以展示所提方法在流畅性、信息量和整体质量等定性指标上的有效性。