Systematic reviews and meta-analyses frequently require numerical data that authors report only as figures, yet manual digitisation is slow and does not scale. We present PlotPick, an open-source tool that uses vision-language models (VLMs) to batch-extract structured tabular data from scientific figures. We evaluate six VLMs from three providers on two established chart-to-table benchmarks (ChartX and PlotQA) and compare against the dedicated chart-to-table model DePlot. All six VLMs outperform DePlot on both benchmarks. On ChartX (restricted to bar charts, line charts, box plots, and histograms; n=300), VLMs achieve 88-96% recall versus 71% for DePlot. On PlotQA (n=529), VLMs achieve 86-99% RMSF1 versus 94% for DePlot. The gap is largest on chart types absent from the dedicated models' training data: on box plots, DePlot achieves 24% RMSF1 while VLMs achieve 83-97%. PlotPick is available at https://plotpick.streamlit.app.
翻译:系统综述和荟萃分析经常需要从作者仅以图形形式报告的数值数据,而手动数字化过程缓慢且难以规模化。我们提出PlotPick,一个利用视觉-语言模型(VLM)从科学图形中批量提取结构化表格数据的开源工具。我们在两个已建立的图表转表格基准(ChartX和PlotQA)上评估了来自三家服务机构(!)的六种视觉-语言模型,并与专用图表转表格模型DePlot进行了比较。所有六种视觉-语言模型在两个基准测试中均优于DePlot。在ChartX(仅限于条形图、折线图、箱线图和直方图;n=300)上,视觉-语言模型达到了88-96%的召回率,而DePlot为71%。在PlotQA(n=529)上,视觉-语言模型达到了86-99%的RMSF1分数,而DePlot为94%。在专用模型训练数据中未出现的图表类型上,性能差距最为显著:在箱线图上,DePlot的RMSF1分数为24%,而视觉-语言模型达到了83-97%。PlotPlot可通过 https://plotpick.streamlit.app 获取。