ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfully do these models actually describe charts? Current benchmarks fall short on two fronts: existing datasets consist of simple, homogeneous charts paired with shallow, fact-enumerating descriptions; and prevailing metrics fail to capture the multi-faceted nature of description quality. To address these gaps, we present the Chart Faithfulness and Insightfulness Benchmark (ChartFI-Bench). We first summarize four dimensions that characterize high-quality chart descriptions: factual accuracy, salient feature emphasis, domain-informed guidance, and chart-text complementarity. Guided by these dimensions, we construct a high-quality benchmark comprising 896 chart-description pairs, which feature visually complex charts and semantically rich descriptions. Furthermore, we design four aligned evaluation metrics -- Faithfulness, Coverage, Informativeness, and Acuity -- to systematically assess the quality of descriptions across these dimensions. Experiments conducted on mainstream MLLMs demonstrate the effectiveness of the proposed framework and reveal common weaknesses among existing models.

翻译：摘要：图表描述对于无障碍访问、跨模态检索以及辅助读者从复杂可视化中提取洞见至关重要。随着多模态大语言模型（MLLMs）被广泛用于自动生成图表描述，一个关键问题随之浮现：这些模型描述图表的忠实性与洞察力究竟如何？现有基准测试存在两方面不足：现有数据集由形式单一的同质化图表及其浅层事实枚举型描述组成；而主流评估指标未能捕捉描述质量的多维特性。为弥补这些缺陷，我们提出图表忠实性与洞察力基准（ChartFI-Bench）。首先总结高质量图表描述的四个维度：事实准确性、显著特征强调、领域知识引导性以及图表-文本互补性。基于这些维度，我们构建了一个包含896组图表-描述对的高质量基准数据集，其特色在于视觉复杂的图表与语义丰富的描述。此外，我们设计了四个对齐的评估指标——忠实度（Faithfulness）、覆盖率（Coverage）、信息量（Informativeness）和敏锐度（Acuity）——以系统性评估描述在各维度的质量。针对主流MLLMs的实验验证了所提框架的有效性，并揭示了现有模型的常见缺陷。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

多模态大型语言模型：综述

专知会员服务

47+阅读 · 2025年6月14日

当持续学习遇上多模态大型语言模型：综述

专知会员服务

32+阅读 · 2025年3月5日

多模态大语言模型在文本丰富图像理解中的应用：全面综述

专知会员服务

27+阅读 · 2025年3月2日

116页最新《多模态大型语言模型》全面综述与指南

专知会员服务

65+阅读 · 2024年11月12日