Text-to-Visualization (Text2VIS) enables users to create visualizations from natural language queries, making data insights more accessible. However, Text2VIS faces challenges in interpreting ambiguous queries, as users often express their visualization needs in imprecise language. To address this challenge, we introduce nBench 2.0, a new benchmark designed to evaluate Text2VIS systems in scenarios involving ambiguous queries. nvBench 2.0 includes 7,878 natural language queries and 24,076 corresponding visualizations, derived from 780 tables across 153 domains. It is built using a controlled ambiguity-injection pipeline that generates ambiguous queries through a reverse-generation workflow. By starting with unambiguous seed visualizations and selectively injecting ambiguities, the pipeline yields multiple valid interpretations for each query, with each ambiguous query traceable to its corresponding visualization through step-wise reasoning paths. We evaluate various Large Language Models (LLMs) on their ability to perform ambiguous Text2VIS tasks using nBench 2.0. We also propose Step-Text2Vis, an LLM-based model trained on nvBench 2.0, which enhances performance in ambiguous scenarios through step-wise preference optimization. Our results show that Step-Text2Vis outperforms all baselines, setting a new state-of-the-art for ambiguous Text2VIS tasks. Our source code and data are available at https://nvbench2.github.io/
翻译:文本到可视化(Text2VIS)使用户能够根据自然语言查询创建可视化图表,使数据洞察更加易于获取。然而,Text2VIS在解释歧义查询方面面临挑战,因为用户通常使用不精确的语言表达其可视化需求。为应对这一挑战,我们引入了nvBench 2.0,这是一个旨在评估Text2VIS系统在处理涉及歧义查询场景的新基准。nvBench 2.0包含7,878个自然语言查询和24,076个对应的可视化结果,这些数据源自153个领域的780张表格。它通过一个受控的歧义注入流程构建,该流程采用反向生成工作流来生成歧义查询。该流程从无歧义的种子可视化开始,有选择地注入歧义,从而为每个查询产生多个有效解释,并且每个歧义查询都可以通过逐步推理路径追溯到其对应的可视化。我们使用nvBench 2.0评估了各种大型语言模型(LLMs)执行歧义Text2VIS任务的能力。我们还提出了Step-Text2Vis,这是一个基于LLM、在nvBench 2.0上训练的模型,它通过逐步偏好优化提升了在歧义场景下的性能。我们的结果表明,Step-Text2Vis在所有基线模型中表现最优,为歧义Text2VIS任务设定了新的技术水平。我们的源代码和数据可在 https://nvbench2.github.io/ 获取。