Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the understanding and generation of infographic charts. The dataset is constructed through an inductive process that identifies 75 chart types, 440 chart variations, and 68 layout templates from real infographic charts and uses them to create synthetic ones programmatically. We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation. By capturing the visual and structural complexity of real design, ChartGalaxy provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.
翻译:信息图图表是一种通过结合视觉元素(如图表、图像)与文本信息来传达抽象数据的强大媒介。然而,其视觉与结构的丰富性对通常基于普通图表训练的大型视觉-语言模型构成了挑战。为弥合这一差距,我们提出了ChartGalaxy,一个百万规模的数据集,旨在推进信息图图表的理解与生成。该数据集通过归纳过程构建:从真实信息图图表中识别出75种图表类型、440种图表变体及68种布局模板,并利用它们以编程方式生成合成图表。我们通过以下方面展示了该数据集的实用性:1)通过微调提升信息图图表理解能力;2)为信息图图表建立代码生成基准;3)实现基于示例的信息图图表生成。通过捕捉真实设计的视觉与结构复杂性,ChartGalaxy为增强大型视觉-语言模型的多模态推理与生成能力提供了有价值的资源。