Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such do not capture the diversity of relationships in the documents. To preserve only key information and relationships of the documents in the heterogeneous graph, HGSUM uses graph pooling to compress the input graph. And to guide HGSUM to learn compression, we introduce an additional objective that maximizes the similarity between the compressed graph and the graph constructed from the ground-truth summary during training. HGSUM is trained end-to-end with graph similarity and standard cross-entropy objectives. Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. The code for our model and experiments is available at: https://github.com/oaimli/HGSum.
翻译:多文档摘要(MDS)旨在为多个相关文档生成摘要。我们提出HGSUM模型,该模型扩展了编码器-解码器架构,通过引入异构图来表示文档中不同语义单元(如单词和句子)。这与现有MDS模型不同,后者未考虑图的不同边类型,因此未能捕捉文档中关系的多样性。为仅保留文档中关键信息及其在异构图中的关系,HGSUM采用图池化技术压缩输入图。此外,为引导HGSUM学习压缩过程,我们引入额外目标函数,在训练期间最大化压缩图与从真实摘要构建的图之间的相似性。HGSUM通过图相似性损失与标准交叉熵损失进行端到端训练。在MULTI-NEWS、WCEP-100和ARXIV数据集上的实验结果表明,HGSUM优于现有最先进的MDS模型。模型及实验的代码已开源:https://github.com/oaimli/HGSum