The Natural Language to Visualization (NL2Vis) task aims to transform natural-language descriptions into visual representations for a grounded table, enabling users to gain insights from vast amounts of data. Recently, many deep learning-based approaches have been developed for NL2Vis. Despite the considerable efforts made by these approaches, challenges persist in visualizing data sourced from unseen databases or spanning multiple tables. Taking inspiration from the remarkable generation capabilities of Large Language Models (LLMs), this paper conducts an empirical study to evaluate their potential in generating visualizations, and explore the effectiveness of in-context learning prompts for enhancing this task. In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis. Our findings suggest that transforming structured tabular data into programs is effective, and it is essential to consider the table schema when formulating prompts. Furthermore, we evaluate two types of LLMs: finetuned models (e.g., T5-Small) and inference-only models (e.g., GPT-3.5), against state-of-the-art methods, using the NL2Vis benchmarks (i.e., nvBench). The experimental results reveal that LLMs outperform baselines, with inference-only models consistently exhibiting performance improvements, at times even surpassing fine-tuned models when provided with certain few-shot demonstrations through in-context learning. Finally, we analyze when the LLMs fail in NL2Vis, and propose to iteratively update the results using strategies such as chain-of-thought, role-playing, and code-interpreter. The experimental results confirm the efficacy of iterative updates and hold great potential for future study.
翻译:自然语言到可视化(NL2Vis)任务旨在将自然语言描述转换为基于给定表格的可视化表示,从而帮助用户从海量数据中获取洞察。近年来,针对NL2Vis任务已开发出许多基于深度学习的方法。尽管这些方法付出了巨大努力,但在处理来自未见数据库或多表关联的数据可视化时仍面临挑战。受大型语言模型(LLMs)卓越生成能力的启发,本文开展了一项实证研究,评估其在生成可视化方面的潜力,并探索上下文学习提示(in-context learning prompts)对增强该任务的有效性。具体而言,我们首先研究了将结构化表格数据转换为序列化文本提示的方法,以便将其输入LLMs,并分析哪些表格内容对NL2Vis任务贡献最大。研究发现:将结构化表格数据转换为程序形式是有效的,且在设计提示时需优先考虑表格模式(table schema)。此外,我们使用NL2Vis基准数据集(即nvBench)评估了两类LLMs——微调模型(如T5-Small)和仅推理模型(如GPT-3.5)——与现有最优方法的性能对比。实验结果表明,LLMs显著优于基线方法;通过上下文学习提供少量示例时,仅推理模型的性能持续提升,有时甚至超越微调模型。最后,我们分析了LLMs在NL2Vis任务中失败的情况,并提出通过链式思维(chain-of-thought)、角色扮演(role-playing)和代码解释器(code-interpreter)等策略迭代更新结果。实验结果验证了迭代更新的有效性,并为未来研究提供了重要潜力。