Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to SSDS, as it is a generative task, and each dialogue can be summarized in different ways. In this work, we propose a novel scoring approach, SiCF, which encapsulates three primary dimensions of summarization model quality: Semantic invariance (indicative of model confidence), Coverage (factual recall), and Faithfulness (factual precision). Using the SiCF score, we select unlabeled dialogues with high-quality generated summaries to train summarization models. Comprehensive experiments on three public datasets demonstrate the effectiveness of SiCF scores in uncertainty estimation and semi-supervised learning for dialogue summarization tasks. Our code is available at \url{https://github.com/amazon-science/summarization-sicf-score}.
翻译:半监督对话摘要生成(SSDS)利用模型生成的摘要来减少对人工标注数据的依赖,并提升摘要模型性能。先前针对半监督学习的研究主要关注自然语言理解任务中的标签噪声处理,假设每个样本具有唯一标签。然而,这些方法无法直接应用于SSDS,因为该任务属于生成式任务,且同一段对话可能存在多种不同的摘要方式。本文提出了一种新颖的评分方法SiCF,该评分涵盖摘要模型质量的三个核心维度:语义不变性(表征模型置信度)、覆盖率(事实召回率)和忠实度(事实精确度)。通过SiCF评分,我们选取具有高质量生成摘要的无标注对话来训练摘要模型。在三个公开数据集上的全面实验表明,SiCF评分在对话摘要任务的不确定性估计和半监督学习中具有显著效果。我们的代码开源在 \url{https://github.com/amazon-science/summarization-sicf-score}。