We present SciClaimEval, a new scientific dataset for the claim verification task. Unlike existing resources, SciClaimEval features authentic claims, including refuted ones, directly extracted from published papers. To create refuted claims, we introduce a novel approach that modifies the supporting evidence (figures and tables), rather than altering the claims or relying on large language models (LLMs) to fabricate contradictions. The dataset provides cross-modal evidence with diverse representations: figures are available as images, while tables are provided in multiple formats, including images, LaTeX source, HTML, and JSON. SciClaimEval contains 1,664 annotated samples from 180 papers across three domains, machine learning, natural language processing, and medicine, validated through expert annotation. We benchmark 11 multimodal foundation models, both open-source and proprietary, across the dataset. Results show that figure-based verification remains particularly challenging for all models, as a substantial performance gap remains between the best system and human baseline.
翻译:我们提出了SciClaimEval,一个用于声明验证任务的新型科学数据集。与现有资源不同,SciClaimEval包含直接从已发表论文中提取的真实声明,其中包括被反驳的声明。为了创建被反驳的声明,我们引入了一种新颖的方法,即修改支持性证据(图表),而不是改变声明本身或依赖大型语言模型来制造矛盾。该数据集提供了具有多样化表示的跨模态证据:图表以图像形式提供,而表格则以多种格式提供,包括图像、LaTeX源代码、HTML和JSON。SciClaimEval包含来自机器学习、自然语言处理和医学三个领域的180篇论文中的1,664个标注样本,并通过专家标注进行了验证。我们在该数据集上对11个开源和专有的多模态基础模型进行了基准测试。结果表明,基于图表的验证对所有模型来说仍然特别具有挑战性,因为最佳系统与人类基线之间仍存在显著的性能差距。