From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.

翻译：图表形式的数据可视化在数据分析中扮演着关键角色，它提供关键见解并辅助做出明智决策。近年来，随着大基础模型的兴起，自动图表理解领域取得了显著进展。基础模型（例如大语言模型）已彻底改变了各种自然语言处理任务，并越来越多地应用于图表理解任务。本综述论文全面概述了在这些基础模型背景下，图表理解领域的最新进展、挑战与未来方向。我们回顾了研究图表理解任务所需的关键基础构建模块。此外，我们探讨了各种任务及其评估指标，以及图表和文本输入的来源。随后，我们审视了多种建模策略，包括基于分类的方法和基于生成的方法，以及能提升图表理解性能的工具增强技术。再者，我们讨论了各项任务的最先进性能，并探讨了如何进一步提升性能。文中还阐述了面临的挑战与未来方向，强调了几个重要议题，例如特定领域图表、评估指标开发投入不足以及面向智能体的设置。本综述论文为自然语言处理、计算机视觉和数据分析领域的研究人员与实践者提供了全面的资源，为利用大基础模型进行图表理解的未来研究提供了宝贵的见解与方向。本文提及的研究以及新兴的研究成果将持续更新于：https://github.com/khuangaf/Awesome-Chart-Understanding。