ChartifyText: Automated Chart Generation from Data-Involved Texts via LLM

Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, which is essentially a challenging task. The challenges originate from text ambiguities, intrinsic sparsity and uncertainty of data in text documents, and subjective sentiment differences. Specifically, we propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts. It consists of two major modules: tabular data inference and expressive chart generation. The tabular data inference module employs systematic prompt engineering to guide the LLM (e.g., GPT-4) to infer table data, where data ranges, uncertainties, missing data values and corresponding subjective sentiments are explicitly considered. The expressive chart generation module augments standard charts with intuitive visual encodings and concise texts to accurately convey the underlying data and insights. We extensively evaluate the effectiveness of ChartifyText on real-world data-involved text documents through case studies, in-depth interviews with three visualization experts, and a carefully-designed user study with 15 participants. The results demonstrate the usefulness and effectiveness of ChartifyText in helping readers efficiently and effectively make sense of data-involved texts.

翻译：涉及数值的文本文档广泛应用于科学研究、经济、公共卫生和新闻等各个领域。然而，读者往往难以快速解读此类包含数据的文本并获取深层洞见。为填补这一研究空白，本文旨在自动生成图表以准确传达底层数据与观点给读者，这本质上是一项具有挑战性的任务。挑战主要源于文本歧义性、文档中数据固有的稀疏性与不确定性，以及主观情感差异。具体而言，我们提出了ChartifyText——一种利用大语言模型（LLMs）将复杂数据文本转换为表达性图表的全自动新方法。该方法包含两个核心模块：表格数据推断与表达性图表生成。表格数据推断模块采用系统化的提示工程引导LLM（如GPT-4）推断表格数据，其中明确考虑了数据范围、不确定性、缺失数据值及对应的主观情感。表达性图表生成模块通过直观的可视化编码与精炼文本增强标准图表，以准确传达底层数据与洞察。我们通过真实世界数据文本的案例研究、与三位可视化专家的深度访谈，以及包含15名参与者的精心设计用户研究，对ChartifyText的有效性进行了全面评估。结果证明了ChartifyText在帮助读者高效、准确地理解数据文本方面的实用性与有效性。