Evidence plays a crucial role in automated fact-checking. When verifying real-world claims, existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. Such methods ignore the challenges of collecting evidence and may not provide sufficient information to verify real-world claims. Aiming at building a better fact-checking system, we propose to incorporate full text from source documents as evidence and introduce two enriched datasets. The first one is a multilingual dataset, while the second one is monolingual (English). We further develop a latent variable model to jointly extract evidence sentences from documents and perform claim verification. Experiments indicate that including source documents can provide sufficient contextual clues even when gold evidence sentences are not annotated. The proposed system is able to achieve significant improvements upon best-reported models under different settings.
翻译:摘要:证据在自动事实核查中扮演着关键角色。在验证真实世界声明时,现有的事实核查系统要么假定证据句子已给定,要么使用搜索引擎返回的搜索摘要。这些方法忽略了收集证据的挑战,可能无法提供足够的信息来验证真实世界的声明。旨在构建更好的事实核查系统,我们提出将源文档中的全文作为证据,并引入两个丰富的数据集。第一个是多语言数据集,第二个是单语言(英语)数据集。我们进一步开发了一个潜在变量模型,以联合从文档中提取证据句子并执行声明验证。实验表明,即使在没有标注黄金证据句子的情况下,包含源文档也能提供足够的上下文线索。所提出的系统能够在不同设置下显著优于最佳报告模型。