Do Multi-Document Summarization Models Synthesize?

from arxiv, 22 Pages, 13 Figures, 22 Tables. ACL Formatted paper; expanded version of rejected ICLR submisssion https://openreview.net/forum?id=1PTeB4MWCfU Paper de-anonymized ahead of ICLR de-anonymization due to ACL policies/additional conference submission

Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately \emph{synthesize} inputs with respect to a key property or aspect. For example, a synopsis of film reviews all written about a particular movie should reflect the average critic consensus. As a more consequential example, consider narrative summaries that accompany biomedical \emph{systematic reviews} of clinical trial results. These narratives should fairly summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this type of synthesis? To assess this we perform a suite of experiments that probe the degree to which conditional generation models trained for summarization using standard methods yield outputs that appropriately synthesize inputs. We find that existing models do partially perform synthesis, but do so imperfectly. In particular, they are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., the ratio of positive to negative movie reviews). We propose a simple, general method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or \emph{abstaining} when the model produces no good candidate. This approach improves model synthesis performance. We hope highlighting the need for synthesis (in some summarization settings), motivates further research into multi-document summarization methods and learning objectives that explicitly account for the need to synthesize.

翻译：多文档摘要旨在生成对输入集合的简洁概述。对于某些应用，概述应准确**综合**输入在关键属性或方面上的信息。例如，针对某部电影的影评集合，其概述应反映评论家的平均共识。更重要的例子包括伴随生物医学**系统综述**（针对临床试验结果）的叙述性摘要，这些叙述应公正地概括单个试验中可能相互矛盾的结果。本文探讨：现代多文档摘要模型在多大程度上隐式执行此类综合？为评估这一点，我们设计了一系列实验，探究基于标准方法训练的用于摘要的条件生成模型，其输出能否恰当综合输入。研究发现，现有模型虽能部分进行综合，但表现不完美。具体而言，它们对输入顺序变化过度敏感，而对输入组成变化（如正面与负面影评的比例）不够敏感。我们提出一种简单通用的方法：通过生成显式多样化的候选输出集，并从中选择与输入预期聚合度量最匹配的字符串，或在模型未产生优质候选时**弃权**，来提升模型综合能力。该方法显著改善了模型的综合表现。我们希望强调综合需求（在某些摘要场景中）能推动多文档摘要方法及显式考虑综合需求的学习目标的进一步研究。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日