Without accurate transcription of numerical data in scientific documents, a scientist cannot draw accurate conclusions. Unfortunately, the process of copying numerical data from one paper to another is prone to human error. In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources. To support this task, we propose a new benchmark, arXiVeri, which comprises tabular data drawn from open-access academic papers on arXiv. We introduce metrics to evaluate the performance of a table verifier in two key areas: (i) table matching, which aims to identify the source table in a cited document that corresponds to a target table, and (ii) cell matching, which aims to locate shared cells between a target and source table and identify their row and column indices accurately. By leveraging the flexible capabilities of modern large language models (LLMs), we propose simple baselines for table verification. Our findings highlight the complexity of this task, even for state-of-the-art LLMs like OpenAI's GPT-4. The code and benchmark will be made publicly available.
翻译:由于科学文档中数值转录若存在偏差,研究者将无法得出准确结论。然而,不同论文间数值数据的复制过程极易产生人为错误。本文提出通过新颖的自动表格验证(AutoTV)任务应对这一挑战,该任务旨在通过交叉验证引用来源来检验表格中数值数据的准确性。为此,我们构建了新型基准测试集arXiVeri,该数据集包含arXiv开放获取学术论文中的表格数据。我们提出两类关键指标的评估体系:(i)表格匹配——识别被引文献中与目标表格对应的源表格;(ii)单元格匹配——定位目标表格与源表格间的共享单元格,并精确标注其行列索引。通过发挥现代大语言模型(LLMs)的灵活能力,我们建立了表格验证的简易基线方法。研究结果揭示了该任务的复杂性,即使是OpenAI GPT-4等最先进的LLMs也面临挑战。相关代码与基准测试集将公开提供。