We present a new fact-checking benchmark, Check-COVID, that requires systems to verify claims about COVID-19 from news using evidence from scientific articles. This approach to fact-checking is particularly challenging as it requires checking internet text written in everyday language against evidence from journal articles written in formal academic language. Check-COVID contains 1, 504 expert-annotated news claims about the coronavirus paired with sentence-level evidence from scientific journal articles and veracity labels. It includes both extracted (journalist-written) and composed (annotator-written) claims. Experiments using both a fact-checking specific system and GPT-3.5, which respectively achieve F1 scores of 76.99 and 69.90 on this task, reveal the difficulty of automatically fact-checking both claim types and the importance of in-domain data for good performance. Our data and models are released publicly at https://github.com/posuer/Check-COVID.
翻译:我们提出了一项新的事实核查基准数据集Check-COVID,要求系统利用科学文章中的证据对关于COVID-19的新闻声明进行验证。这种事实核查方法极具挑战性,因为它需要将用日常语言编写的互联网文本与用正式学术语言撰写的期刊文章中的证据进行比对。Check-COVID包含1,504条由专家标注的新型冠状病毒新闻声明,每条声明均附有来自科学期刊文章的句子级证据及真实性标签。数据集中涵盖提取型(记者撰写)和构成型(标注者撰写)两类声明。通过使用专为事实核查设计的系统及GPT-3.5进行实验,两者在该任务上分别取得76.99和69.90的F1分数,结果表明自动核查两类声明的困难性,以及领域内数据对获得良好性能的重要性。我们的数据和模型已在https://github.com/posuer/Check-COVID公开发布。