Recent Transformer-based summarization models have provided a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is not quite efficient, which restricts model performance. In this paper, we explore two orthogonal ways to compensate for these pitfalls. First, we augment the Transformer network with a sentence cross-attention module in the decoder, encouraging more abstraction of salient content. Second, we include a curriculum learning approach to reweight the training samples, bringing about an efficient learning procedure. Our second approach to enhance the training strategy of Transformers networks makes stronger gains as compared to the first approach. We apply our model on extreme summarization dataset of Reddit TIFU posts. We further look into three cross-domain summarization datasets (Webis-TLDR-17, CNN/DM, and XSum), measuring the efficacy of curriculum learning when applied in summarization. Moreover, a human evaluation is conducted to show the efficacy of the proposed method in terms of qualitative criteria, namely, fluency, informativeness, and overall quality.
翻译:近年来,基于Transformer的摘要模型为抽象式摘要提供了一种有前景的方法。这些模型超越了句子选择和抽取策略,能够处理更复杂的任务,如新词生成和句子改写。然而,这些模型存在两个缺点:(1)在内容选择方面表现不佳,(2)其训练策略效率不高,限制了模型性能。本文探讨了两种正交方式来弥补这些缺陷。首先,我们在解码器中为Transformer网络增加了一个句子交叉注意力模块,鼓励对显著内容的更强抽象。其次,我们引入了一种课程学习方法对训练样本进行重新加权,从而实现高效的学习过程。第二种增强Transformer网络训练策略的方法相比第一种方法带来了更强的改进。我们将模型应用于Reddit TIFU帖子的极端摘要数据集。我们还进一步研究了三个跨领域摘要数据集(Webis-TLDR-17、CNN/DM和XSum),衡量了课程学习在摘要任务中应用的效果。此外,我们进行了人工评估,以证明所提方法在定性标准(即流畅性、信息量和整体质量)方面的有效性。