Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers. Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data, e.g. in a tabular form. In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks, which can be generally applicable to different downstream tasks, beyond the question-answering task as specifically studied in peer works. Specifically, StructChart first reformulates the chart information from the popular tubular form (specifically linearized CSV) to the proposed Structured Triplet Representations (STR), which is more friendly for reducing the task gap between chart perception and reasoning due to the employed structured information extraction for charts. We then propose a Structuring Chart-oriented Representation Metric (SCRM) to quantitatively evaluate the performance for the chart perception task. To enrich the dataset for training, we further explore the possibility of leveraging the Large Language Model (LLM), enhancing the chart diversity in terms of both chart visual style and its statistical information. Extensive experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm to push the frontier of chart understanding.
翻译:图表在不同科学领域的文献中普遍存在,以易于读者获取的方式传递丰富信息。当前的图表相关任务聚焦于图表感知(即从可视化图表中提取信息)或基于提取的数据(如表格形式)进行推理。本文旨在建立一种统一且标签高效的学习范式,以联合处理感知与推理任务,该范式可普遍适用于不同下游任务,而非局限于同行研究中特定探讨的问答任务。具体而言,StructChart首先将图表信息从常见的表格形式(具体为线性化CSV)重构为所提出的结构化三元组表示(STR),由于采用了针对图表的结构化信息提取,该表示更有利于缩小图表感知与推理任务之间的差距。随后,我们提出了结构化图表导向表示度量(SCRM),以定量评估图表感知任务的性能。为丰富训练数据集,我们进一步探索利用大语言模型(LLM)的可能性,以增强图表在视觉风格和统计信息两方面的多样性。我们在多种图表相关任务上进行了广泛实验,结果表明,统一图表感知-推理范式的有效性和令人期待的前景,将推动图表理解领域的发展前沿。