Existing datasets for automated fact-checking have substantial limitations, such as relying on artificial claims, lacking annotations for evidence and intermediate reasoning, or including evidence published after the claim. In this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. Through a multi-round annotation process, we avoid common pitfalls including context dependence, evidence insufficiency, and temporal leakage, and reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We develop a baseline as well as an evaluation scheme for verifying claims through several question-answering steps against the open web.
翻译:现有自动事实核查数据集存在显著局限,例如依赖人工构造的声明、缺乏证据与中间推理过程的标注,或包含声明发布后产生的证据。本文提出AVeriTeC——一个包含4568条真实世界声明的新数据集,涵盖50个不同机构的事实核查结果。每条声明均标注了基于在线证据的问答对,以及解释证据如何共同推导出判决结果的文本说明。通过多轮标注流程,我们避免了上下文依赖、证据不足和时间泄露等常见问题,并实现了判决结果间高度一致的注释者间信度($\kappa=0.619$)。我们开发了基准模型及评估方案,通过针对开放网络的多个问答步骤实现声明验证。