Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of authoritative websites must keep their information up to date. We propose representing factual changes between paired documents as question-answer pairs, where the answer to the same question differs between two versions. We find that question-answer pairs can flexibly and concisely capture the updated contents. Provided with paired documents, annotators identify questions that are answered by one passage but answered differently or cannot be answered by the other. We release DIFFQG which consists of 759 QA pairs and 1153 examples of paired passages with no factual change. These questions are intended to be both unambiguous and information-seeking and involve complex edits, pushing beyond the capabilities of current question generation and factual change detection systems. Our dataset summarizes the changes between two versions of the document as questions and answers, studying automatic update summarization in a novel way.
翻译:识别同一文章两个版本之间的差异有助于更新知识库并理解文章的演变过程。成对文本在多种场景中自然出现:记者撰写类似的新闻报道,权威网站的维护者需确保信息始终最新。我们提出将成对文档间的事实变化表示为问答对,其中同一问题在两个版本中的答案不同。我们发现问答对能够灵活且简洁地捕捉更新内容。通过提供成对文档,标注者识别出那些在一段文本中有答案,但在另一段文本中答案不同或无法回答的问题。我们发布了DIFFQG数据集,包含759个问答对和1153个无事实变化的成对段落示例。这些问题旨在既明确又具有信息寻求性,涉及复杂编辑,超出了当前问题生成和事实变化检测系统的能力范围。该数据集以问题和答案的形式总结文档两个版本之间的变化,以一种新颖的方式研究自动更新摘要。