Dialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues have limitations because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one is produced by a fine-tuned summarization model, and the other is a collection of dialogue turns that convey important information. We then choose one of these pseudo summaries based on the difference in information distribution across different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings.
翻译:对话摘要因其广泛的应用场景近期受到显著关注。然而,现有对话摘要方法存在局限性:既未充分考虑对话的内在结构,又过度依赖标注数据,导致在新领域中的表现欠佳。本文提出DIONYSUS(面向对话摘要的预训练动态输入优化)——一种适用于任意新领域的对话摘要编码器-解码器预训练模型。为预训练DIONYSUS,我们为每个对话样本构建两类伪摘要:第一类由微调后的摘要模型生成,第二类为蕴含关键信息的对话轮次集合。随后,依据不同对话类型中信息分布的差异,从二者中择一作为伪摘要。该伪摘要通过自监督方式在大型对话语料库上作为DIONYSUS的预训练目标。实验表明,在零样本与少样本场景下,DIONYSUS在六个数据集上的ROUGE评分均优于现有方法。