ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering

Chart question answering (ChartQA) tasks play a critical role in interpreting and extracting insights from visualization charts. While recent advancements in multimodal large language models (MLLMs) like GPT-4o have shown promise in high-level ChartQA tasks, such as chart captioning, their effectiveness in low-level ChartQA tasks (e.g., identifying correlations) remains underexplored. In this paper, we address this gap by evaluating MLLMs on low-level ChartQA using a newly curated dataset, ChartInsights, which consists of 22,347 (chart, task, query, answer) covering 10 data analysis tasks across 7 chart types. We systematically evaluate 19 advanced MLLMs, including 12 open-source and 7 closed-source models. The average accuracy rate across these models is 39.8%, with GPT-4o achieving the highest accuracy at 69.17%. To further explore the limitations of MLLMs in low-level ChartQA, we conduct experiments that alter visual elements of charts (e.g., changing color schemes, adding image noise) to assess their impact on the task effectiveness. Furthermore, we propose a new textual prompt strategy, Chain-of-Charts, tailored for low-level ChartQA tasks, which boosts performance by 14.41%, achieving an accuracy of 83.58%. Finally, incorporating a visual prompt strategy that directs attention to relevant visual elements further improves accuracy to 84.32%.

翻译：图表问答（ChartQA）任务在解释和提取可视化图表洞察方面起着关键作用。虽然近期多模态大语言模型（MLLMs）（如GPT-4o）在高级ChartQA任务（如图表标题生成）中展现出潜力，但它们在低级ChartQA任务（例如识别相关性）中的有效性仍未得到充分探索。本文通过使用新构建的数据集ChartInsights评估MLLMs在低级ChartQA任务上的表现来填补这一空白。该数据集包含22,347个（图表、任务、查询、答案）样本，涵盖7种图表类型中的10种数据分析任务。我们系统评估了19个先进的MLLMs，包括12个开源模型和7个闭源模型。这些模型的平均准确率为39.8%，其中GPT-4o以69.17%的准确率表现最佳。为了进一步探究MLLMs在低级ChartQA中的局限性，我们进行了改变图表视觉元素（例如更改配色方案、添加图像噪声）的实验，以评估这些变化对任务有效性的影响。此外，我们提出了一种专为低级ChartQA任务设计的新文本提示策略——Chain-of-Charts，该策略将性能提升了14.41%，准确率达到83.58%。最后，结合一种引导注意力至相关视觉元素的视觉提示策略，准确率进一步提升至84.32%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日