An important problem of the sequence-to-sequence neural models widely used in abstractive summarization is exposure bias. To alleviate this problem, re-ranking systems have been applied in recent years. Despite some performance improvements, this approach remains underexplored. Previous works have mostly specified the rank through the ROUGE score and aligned candidate summaries, but there can be quite a large gap between the lexical overlap metric and semantic similarity. In this paper, we propose a novel training method in which a re-ranker balances the lexical and semantic quality. We further newly define false positives in ranking and present a strategy to reduce their influence. Experiments on the CNN/DailyMail and XSum datasets show that our method can estimate the meaning of summaries without seriously degrading the lexical aspect. More specifically, it achieves an 89.67 BERTScore on the CNN/DailyMail dataset, reaching new state-of-the-art performance. Our code is publicly available at https://github.com/jeewoo1025/BalSum.
翻译:序列到序列神经模型在生成式摘要中被广泛使用,其重要问题之一是曝光偏差。为缓解此问题,近年来引入了重排序系统。尽管性能有所提升,但该方法仍未被充分探索。以往研究主要通过ROUGE分数与候选摘要的匹配度来确定排序,但词汇重叠度量与语义相似度之间可能存在较大差距。本文提出一种新型训练方法,使重排序器能够平衡词汇质量与语义质量。我们进一步定义了排序中的假阳性问题,并提出降低其影响的策略。在CNN/DailyMail和XSum数据集上的实验表明,该方法能在不严重削弱词汇层面的前提下有效评估摘要语义。具体而言,在CNN/DailyMail数据集上,该方法实现了89.67的BERTScore,达到新的最优性能。我们的代码已公开于https://github.com/jeewoo1025/BalSum。