Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.
翻译:多文档摘要旨在从同一主题的文档集合中获取核心信息。本文提出了一种用于无监督多文档抽取式摘要的整体框架。该方法结合了与整体度量相关的整体束搜索推理方法,称为子集代表性指数(SRI)。SRI平衡了源文档中句子子集的重要性和多样性,并且可以以无监督和自适应方式计算。为证明方法的有效性,我们在无监督和自适应设置下,对小规模和大规模多文档摘要数据集进行了广泛实验。所提方法在ROUGE分数和多样性度量上显著优于强基线模型。研究结果还表明,多样性对于提升多文档摘要性能至关重要。