With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).
翻译:随着任务特定预训练目标的兴起,PEGASUS等抽象摘要模型在下游摘要任务中展现出诱人的零样本性能。然而,此类无监督模型的性能仍显著落后于有监督模型。与有监督设置类似,我们注意到这些模型生成的摘要候选质量存在极大方差,而最终仅保留一个候选作为摘要输出。本文提出以无监督方式对摘要候选进行重排序,旨在缩小无监督模型与有监督模型之间的性能差距。我们的方法在四个广泛采用的摘要基准测试中,将无监督PEGASUS的相对平均ROUGE指标最高提升7.27%,ChatGPT最高提升6.86%;在30个零样本迁移设置(在数据集上微调,在另一数据集上评估)中,平均获得7.51%的相对增益(从XSum到WikiHow最高可达23.73%)。