Text documents with numerical values involved are widely used in various applications such as scientific research, economy, public health and journalism. However, it is difficult for readers to quickly interpret such data-involved texts and gain deep insights. To fill this research gap, this work aims to automatically generate charts to accurately convey the underlying data and ideas to readers, which is essentially a challenging task. The challenges originate from text ambiguities, intrinsic sparsity and uncertainty of data in text documents, and subjective sentiment differences. Specifically, we propose ChartifyText, a novel fully-automated approach that leverages Large Language Models (LLMs) to convert complex data-involved texts to expressive charts. It consists of two major modules: tabular data inference and expressive chart generation. The tabular data inference module employs systematic prompt engineering to guide the LLM (e.g., GPT-4) to infer table data, where data ranges, uncertainties, missing data values and corresponding subjective sentiments are explicitly considered. The expressive chart generation module augments standard charts with intuitive visual encodings and concise texts to accurately convey the underlying data and insights. We extensively evaluate the effectiveness of ChartifyText on real-world data-involved text documents through case studies, in-depth interviews with three visualization experts, and a carefully-designed user study with 15 participants. The results demonstrate the usefulness and effectiveness of ChartifyText in helping readers efficiently and effectively make sense of data-involved texts.
翻译:涉及数值的文本文档在科学研究、经济、公共卫生和新闻等众多领域广泛应用。然而,读者往往难以快速解读此类数据文本并获取深层洞见。为填补这一研究空白,本文旨在自动生成图表以准确传达底层数据与观点给读者,这本质上是一项具有挑战性的任务。挑战主要源于文本的歧义性、文档中数据固有的稀疏性与不确定性,以及主观情感差异。具体而言,我们提出了ChartifyText——一种创新的全自动化方法,该方法利用大型语言模型将复杂的数据文本转换为富有表现力的图表。该方法包含两个核心模块:表格数据推断与表现力图表生成。表格数据推断模块采用系统化的提示工程引导LLM(如GPT-4)推断表格数据,其中明确考虑了数据范围、不确定性、缺失数据值及对应的主观情感。表现力图表生成模块通过直观的可视化编码与精炼文本增强标准图表,以准确传达底层数据与洞察。我们通过案例研究、与三位可视化专家的深度访谈,以及一项精心设计的包含15名参与者的用户研究,在真实世界数据文本文档上对ChartifyText的有效性进行了全面评估。结果表明,ChartifyText能有效帮助读者高效且准确地理解数据文本。