Summarization quality evaluation is a non-trivial task in text summarization. Contemporary methods can be mainly categorized into two scenarios: (1) reference-based: evaluating with human-labeled reference summary; (2) reference-free: evaluating the summary consistency of the document. Recent studies mainly focus on one of these scenarios and explore training neural models built on PLMs to align with human criteria. However, the models from different scenarios are optimized individually, which may result in sub-optimal performance since they neglect the shared knowledge across different scenarios. Besides, designing individual models for each scenario caused inconvenience to the user. Inspired by this, we propose Unified Multi-scenario Summarization Evaluation Model (UMSE). More specifically, we propose a perturbed prefix tuning method to share cross-scenario knowledge between scenarios and use a self-supervised training paradigm to optimize the model without extra human labeling. Our UMSE is the first unified summarization evaluation framework engaged with the ability to be used in three evaluation scenarios. Experimental results across three typical scenarios on the benchmark dataset SummEval indicate that our UMSE can achieve comparable performance with several existing strong methods which are specifically designed for each scenario.
翻译:摘要质量评估是文本摘要中的一项重要任务。现有方法主要分为两类场景:(1)基于参考的评估:通过人工标注的参考摘要进行评估;(2)无参考评估:评估摘要与文档的一致性。近年来的研究主要集中于其中一种场景,并探索训练基于预训练语言模型的神经网络模型以匹配人类评价标准。然而,不同场景下的模型被单独优化,由于忽视了跨场景的共享知识,可能导致次优性能。此外,为每个场景单独设计模型也给用户带来不便。受此启发,我们提出统一多场景摘要评估模型(UMSE)。具体而言,我们提出一种扰动前缀微调方法,以共享跨场景知识,并采用自监督训练范式优化模型,无需额外人工标注。UMSE是首个具备三种评估场景能力的统一摘要评估框架。在基准数据集SummEval上的三个典型场景实验结果表明,我们的UMSE能够达到与现有专为各场景设计的多种强基线方法相当的性能。