From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.

翻译：图表形式的数据可视化在数据分析中扮演着关键角色，能够提供重要洞见并辅助决策制定。近年来，随着大规模基础模型的兴起，自动图表理解领域取得了显著进展。基础模型（例如大语言模型）已彻底变革了多种自然语言处理任务，并日益应用于图表理解任务。本综述论文全面概述了在这些基础模型背景下，图表理解领域的最新进展、挑战与未来方向。我们回顾了研究图表理解任务所需的关键基础构建模块。此外，我们探讨了多种任务及其评估指标，以及图表与文本输入的来源。随后，我们检视了多种建模策略，涵盖基于分类和基于生成的方法，以及能够提升图表理解性能的工具增强技术。进一步地，我们讨论了各项任务的最先进性能表现，并探讨了如何提升性能。本文亦针对挑战与未来方向进行了论述，重点强调了若干重要议题，例如领域专用图表、评估指标开发投入的不足，以及面向智能体的设定。本综述论文为自然语言处理、计算机视觉和数据分析领域的研究者与实践者提供了全面的资源，为利用大规模基础模型进行图表理解的未来研究提供了宝贵的见解与方向指引。本文提及的研究以及新兴成果将持续更新于：https://github.com/khuangaf/Awesome-Chart-Understanding。