We present a comprehensive study of chart visual question-answering(QA) task, to address the challenges faced in comprehending and extracting data from chart visualizations within documents. Despite efforts to tackle this problem using synthetic charts, solutions are limited by the shortage of annotated real-world data. To fill this gap, we introduce a benchmark and dataset for chart visual QA on real-world charts, offering a systematic analysis of the task and a novel taxonomy for template-based chart question creation. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Our study is conducted on a real-world chart dataset from scientific literature, showcasing higher visual complexity compared to other works. Our focus is on template-based QA and how it can serve as a standard for evaluating the first-order logic capabilities of models. The results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models and advance the field of chart visual QA and formal logic verification for neural networks in general.
翻译:我们针对图表视觉问答任务展开了全面研究,旨在解决文档中图表可视化数据理解与提取的挑战。尽管利用合成图表解决该问题已取得一定进展,但受限于标注真实世界数据的匮乏,现有方案仍存在局限性。为填补这一空白,我们引入了一个面向真实世界图表的基准测试与数据集,系统性地分析了该任务,并提出了一种基于模板的图表问题生成新分类体系。我们的贡献包括引入新型答案类型"列表",包含排序与非排序两种变体。研究基于科学文献中的真实图表数据集展开,相较于同类工作展现出更高的视觉复杂度。我们重点探讨了基于模板的问答如何作为评估模型一阶逻辑能力的标准范式。通过在真实世界分布外数据集上开展的实验,我们为大规模预训练模型提供了稳健的性能评估,并推动了图表视觉问答及神经网络形式逻辑验证领域的整体发展。