Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.
翻译:图表是文档中重要的视觉信息来源,有助于深入理解和解读通常以数值形式传达的信息。科学文献中存在大量风格各异的图表。近年来,文档理解领域已开始探索自动图表理解问题,其首要环节便是图表分类。本文综述了当前最先进的图表分类技术,并讨论了现有数据集及其支持的图表类型。我们将这些贡献大致分为基于机器学习、CNN和Transformer的传统方法。此外,我们在近期发布的CHARTINFO UB-UNITECH PMC数据集上(该数据集用于ICPR 2022 CHART-Infographics竞赛,包含15种不同图表类别、22,923张训练图像和13,260张测试图像),对基于CNN和基于Transformer的方法进行了广泛的性能对比分析。我们实现了一种基于视觉的Transformer模型,该模型在图表分类任务中取得了当前最优结果。