Automated fact checking has gained immense interest to tackle the growing misinformation in the digital era. Existing systems primarily focus on synthetic claims on Wikipedia, and noteworthy progress has also been made on real-world claims. In this work, we release Numtemp, a diverse, multi-domain dataset focused exclusively on numerical claims, encompassing temporal, statistical and diverse aspects with fine-grained metadata and an evidence collection without leakage. This addresses the challenge of verifying real-world numerical claims, which are complex and often lack precise information, not addressed by existing works that mainly focus on synthetic claims. We evaluate and quantify the limitations of existing solutions for the task of verifying numerical claims. We also evaluate claim decomposition based methods, numerical understanding based models and our best baselines achieves a macro-F1 of 58.32. This demonstrates that Numtemp serves as a challenging evaluation set for numerical claim verification.
翻译:自动事实验证技术因应对数字时代日益严重的虚假信息而备受关注。现有系统主要关注维基百科上的合成断言,针对真实世界断言的验证也已取得显著进展。本文发布Numtemp,一个专注于数值型断言的多领域多样化数据集,包含时间、统计等多维度细粒度元数据及无泄露的佐证集合。该数据集解决了现有工作(主要聚焦合成断言)未能处理的复杂且缺乏精确信息的实世界数值断言验证难题。我们评估并量化了现有数值断言验证方案的局限性,同时探究了基于断言分解的方法与数值理解模型,最优基线取得58.32的宏平均F1值。结果表明Numtemp可作为数值断言验证的挑战性评测基准。