Data visualization serves as a critical means for presenting data and mining its valuable insights. The task of chart summarization, through natural language processing techniques, facilitates in-depth data analysis of charts. However, there still are notable deficiencies in terms of visual-language matching and reasoning ability for existing approaches. To address these limitations, this study constructs a large-scale dataset of comprehensive chart-caption pairs and fine-tuning instructions on each chart. Thanks to the broad coverage of various topics and visual styles within this dataset, better matching degree can be achieved from the view of training data. Moreover, we propose an innovative chart summarization method, ChartThinker, which synthesizes deep analysis based on chains of thought and strategies of context retrieval, aiming to improve the logical coherence and accuracy of the generated summaries. Built upon the curated datasets, our trained model consistently exhibits superior performance in chart summarization tasks, surpassing 8 state-of-the-art models over 7 evaluation metrics. Our dataset and codes are publicly accessible.
翻译:数据可视化是呈现数据并挖掘有价值信息的关键手段。通过自然语言处理技术实现的图表摘要任务,有助于对图表进行深度数据分析。然而,现有方法在视觉-语言匹配和推理能力方面仍存在明显不足。为解决这些问题,本研究构建了一个包含大规模图表-描述配对及每张图表微调指令的数据集。得益于该数据集对多种主题和视觉风格的广泛覆盖,可从训练数据层面实现更优匹配度。此外,我们提出了一种创新的图表摘要方法——ChartThinker,它融合了基于思维链的深度分析与上下文检索策略,旨在提升生成摘要的逻辑连贯性和准确性。基于所构建的数据集,我们训练的模型在图表摘要任务中始终表现出色,在7项评估指标上超越了8个最先进的模型。我们的数据集和代码已公开提供。