Scientific fact-checking is crucial for ensuring the accuracy, reliability, and trustworthiness of scientific claims. However, existing benchmarks are limited in terms of their claim diversity, reliance on text-based evidence, and oversimplification of scientific reasoning. To address these gaps, we introduce SCITAB, a novel dataset comprising 1,225 challenging scientific claims requiring compositional reasoning with scientific tables. The claims in SCITAB are derived from the actual scientific statements, and the evidence is presented as tables, closely mirroring real-world fact-checking scenarios. We establish benchmarks on SCITAB using state-of-the-art models, revealing its inherent difficulty and highlighting limitations in existing prompting methods. Our error analysis identifies unique challenges, including ambiguous expressions and irrelevant claims, suggesting future research directions. The code and the data are publicly available at https://github.com/XinyuanLu00/SciTab.
翻译:科学事实核查对于确保科学声明的准确性、可靠性和可信度至关重要。然而,现有基准在声明多样性、对文本证据的依赖以及过度简化科学推理方面存在局限。为解决这些不足,我们提出了SCITAB——一个包含1,225条需要基于科学表格进行组合推理的挑战性科学声明的新型数据集。SCITAB中的声明源自真实科学陈述,证据以表格形式呈现,高度还原了现实世界的事实核查场景。我们采用最先进模型在SCITAB上建立了基准测试,揭示了该数据集的内在难度并突显了现有提示方法的局限性。错误分析识别出歧义表达和无关声明等独特挑战,为未来研究指明了方向。代码与数据已在https://github.com/XinyuanLu00/SciTab 公开获取。