LLMs have Visualization Literacy: Now What? Experiments Exploring LLM Visualization Evaluation Capabilities

As Large Language Models (LLMs) become more popular within the visualization community, researchers increasingly leverage them for diverse visualization tasks such as design guideline suggestions and visualization evaluation. However, in order for LLMs to act as trustworthy and fair evaluators, we argue that LLMs would need to possess visualization literacy, be capable of following user instructions and uphold graphical integrity. We test the latest versions of the most prominent LLMs, specifically Anthropic's Claude (Opus 4.5), OpenAI's Generative Pretrained Transformers (GPT 5.2 Pro), and Google's Gemini (Gemini 3 Flash) on these features and find that while these models now possess visualization literacy, they still struggle with other features necessary for instruction following and graphical integrity. Using a modified Visualization Literacy Assessment Test (VLAT), our findings show that these recent LLMs have achieved greater than human-levels of visualization literacy in contrast to prior research. In order to test the models' abilities to follow instructions, we used few-shot and chain-of-thought prompting as proxies for instruction following tasks on evaluating visualization literacy and find that these specialized prompting techniques are becoming obsolete with respect to improving visualization literacy. Additionally, we experiment with the inherent ability of LLMs to evaluate misleading visualizations to test the models' abilities for upholding graphical integrity and find that without specialized or leading prompting techniques, the models struggle with being able to accurately identify whether a visualization is misleading or not. Our results further break down the performance of each model on these tasks, but the culmination of our findings force us to reconsider the current effectiveness of LLMs as visualization evaluators.

翻译：暂无翻译

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

56+阅读 · 2025年3月16日

如何构建o1模型推理能力？清华北大等提出LLaVA-o1: 让视觉语言模型逐步推理

专知会员服务

30+阅读 · 2024年11月19日

大型语言模型（LLMs），附Slides与视频

专知会员服务

71+阅读 · 2024年6月30日

LLM4Science怎么做？UIUC等最新《科学大型语言模型及其在科学发现中的应用》综述

专知会员服务

35+阅读 · 2024年6月23日