Chart Question Answering (CQA) benchmarks are essential for evaluating the capability of Multimodal Large Language Models (MLLMs) to interpret visual data. However, current benchmarks focus primarily on the evaluation of general-purpose CQA but fail to adequately capture domain-specific challenges. We introduce DomainCQA, a systematic methodology for constructing domain-specific CQA benchmarks, and demonstrate its effectiveness by developing AstroChart, a CQA benchmark in the field of astronomy. Our evaluation shows that chart reasoning and combining chart information with domain knowledge for deeper analysis and summarization, rather than domain-specific knowledge, pose the primary challenge for existing MLLMs, highlighting a critical gap in current benchmarks. By providing a scalable and rigorous framework, DomainCQA enables more precise assessment and improvement of MLLMs for domain-specific applications.
翻译:图表问答基准对于评估多模态大语言模型解释视觉数据的能力至关重要。然而,现有基准主要关注通用图表问答评估,未能充分捕捉领域特定的挑战。本文提出DomainCQA——一种构建领域特定图表问答基准的系统化方法,并通过开发天文学领域的图表问答基准AstroChart验证其有效性。评估结果表明,现有多模态大语言模型面临的主要挑战在于图表推理以及将图表信息与领域知识结合进行深度分析与总结,而非领域特定知识本身,这揭示了当前基准存在的关键缺陷。DomainCQA通过提供可扩展且严谨的框架,能够更精确地评估和改进多模态大语言模型在领域特定应用中的性能。