Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the regenerated Visualization Literacy Assessment Test (reVLAT) benchmark with synthetic data, we open-coded 309 erroneous responses from four state-of-the-art models with a barrier-centric strategy adapted from human visualization literacy research. Our analysis yields a taxonomy of MLLM failures, revealing two machine-specific barriers that extend prior human-participation frameworks. Results show that models perform well on simple charts but struggle with color-intensive, segment-based visualizations, often failing to form consistent comparative reasoning. Our findings inform future evaluation and design of reliable AI-driven visualization assistants.
翻译:多模态大语言模型(MLLMs)正日益广泛地应用于可视化解读,但其失效原因尚不明确。本研究首次系统分析了MLLMs的可视化素养障碍。基于合成数据重构的可视化素养评估测试(reVLAT)基准,我们采用源自人类可视化素养研究的障碍中心化策略,对四个前沿模型的309个错误响应进行开放式编码。分析结果构建了MLLM失效的分类体系,揭示了两种超越现有人类参与研究框架的机器特有障碍。实验表明,模型在简单图表上表现良好,但在色彩密集型、基于分段的可视化任务中表现欠佳,常无法形成一致的比较推理。本研究为未来可靠AI驱动可视化助手的评估与设计提供了理论依据。