Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfully do these models actually describe charts? Current benchmarks fall short on two fronts: existing datasets consist of simple, homogeneous charts paired with shallow, fact-enumerating descriptions; and prevailing metrics fail to capture the multi-faceted nature of description quality. To address these gaps, we present the Chart Faithfulness and Insightfulness Benchmark (ChartFI-Bench). We first summarize four dimensions that characterize high-quality chart descriptions: factual accuracy, salient feature emphasis, domain-informed guidance, and chart-text complementarity. Guided by these dimensions, we construct a high-quality benchmark comprising 896 chart-description pairs, which feature visually complex charts and semantically rich descriptions. Furthermore, we design four aligned evaluation metrics -- Faithfulness, Coverage, Informativeness, and Acuity -- to systematically assess the quality of descriptions across these dimensions. Experiments conducted on mainstream MLLMs demonstrate the effectiveness of the proposed framework and reveal common weaknesses among existing models.
翻译:摘要:图表描述对于无障碍访问、跨模态检索以及辅助读者从复杂可视化中提取洞见至关重要。随着多模态大语言模型(MLLMs)被广泛用于自动生成图表描述,一个关键问题随之浮现:这些模型描述图表的忠实性与洞察力究竟如何?现有基准测试存在两方面不足:现有数据集由形式单一的同质化图表及其浅层事实枚举型描述组成;而主流评估指标未能捕捉描述质量的多维特性。为弥补这些缺陷,我们提出图表忠实性与洞察力基准(ChartFI-Bench)。首先总结高质量图表描述的四个维度:事实准确性、显著特征强调、领域知识引导性以及图表-文本互补性。基于这些维度,我们构建了一个包含896组图表-描述对的高质量基准数据集,其特色在于视觉复杂的图表与语义丰富的描述。此外,我们设计了四个对齐的评估指标——忠实度(Faithfulness)、覆盖率(Coverage)、信息量(Informativeness)和敏锐度(Acuity)——以系统性评估描述在各维度的质量。针对主流MLLMs的实验验证了所提框架的有效性,并揭示了现有模型的常见缺陷。