Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfully do these models actually describe charts? Current benchmarks fall short on two fronts: existing datasets consist of simple, homogeneous charts paired with shallow, fact-enumerating descriptions; and prevailing metrics fail to capture the multi-faceted nature of description quality. To address these gaps, we present the Chart Faithfulness and Insightfulness Benchmark (ChartFI-Bench). We first summarize four dimensions that characterize high-quality chart descriptions: factual accuracy, salient feature emphasis, domain-informed guidance, and chart-text complementarity. Guided by these dimensions, we construct a high-quality benchmark comprising 896 chart-description pairs, which feature visually complex charts and semantically rich descriptions. Furthermore, we design four aligned evaluation metrics -- Faithfulness, Coverage, Informativeness, and Acuity -- to systematically assess the quality of descriptions across these dimensions. Experiments conducted on mainstream MLLMs demonstrate the effectiveness of the proposed framework and reveal common weaknesses among existing models.
翻译:图表描述对于无障碍访问、跨模态检索以及帮助读者从复杂可视化中提取洞见至关重要。随着多模态大语言模型被越来越多地应用于自动生成图表描述,一个关键问题随之浮现:这些模型描述图表的忠实性和洞见性究竟如何?当前基准存在两方面不足:现有数据集由简单、同质的图表配对浅层、枚举事实的描述构成;而现行评价指标未能捕捉描述质量的多维特性。为解决上述问题,我们提出图表忠实性与洞见性基准(ChartFI-Bench)。首先总结高质量图表描述的四个维度:事实准确性、显著特征强调、领域知识引导、图文互补性。基于这些维度,构建包含896个图表-描述对的高质量基准数据集,其特色在于视觉复杂的图表与语义丰富的描述。进一步设计四个对齐评价指标——忠实性、覆盖度、信息量、敏锐度——以系统评估描述质量在各维度的表现。针对主流多模态大语言模型的实验表明,所提框架具备有效性,并揭示了现有模型的常见缺陷。