Large Language Models (LLMs) have become a cornerstone for automated visualization code generation, enabling users to create charts through natural language instructions. Despite improvements from techniques like few-shot prompting and query expansion, existing methods often struggle when requests are underspecified in actionable details (e.g., data preprocessing assumptions, solver or library choices, etc.), frequently necessitating manual intervention. To overcome these limitations, we propose VisPath: a Multi-Path Reasoning and Feedback-Driven Optimization Framework for Visualization Code Generation. VisPath handles underspecified queries through structured, multi-stage processing. It begins by using Chain-of-Thought (CoT) prompting to reformulate the initial user input, generating multiple extended queries in parallel to surface alternative plausible concretizations of the request. These queries then generate candidate visualization scripts, which are executed to produce diverse images. By assessing the visual quality and correctness of each output, VisPath generates targeted feedback that is aggregated to synthesize an optimal final result. Extensive experiments on MatPlotBench and Qwen-Agent Code Interpreter Benchmark show that VisPath outperforms state-of-the-art methods, providing a more reliable framework for AI-driven visualization generation.
翻译:大型语言模型已成为自动化可视化代码生成的基石,使得用户能够通过自然语言指令创建图表。尽管少样本提示和查询扩展等技术带来了改进,现有方法在请求缺乏可操作细节(例如数据预处理假设、求解器或库的选择等)时仍常常面临困难,频繁需要人工干预。为克服这些限制,我们提出了VisPath:一种用于可视化代码生成的多路径推理与反馈驱动优化框架。VisPath通过结构化、多阶段处理来处理未充分指定的查询。它首先使用思维链提示来重新表述初始用户输入,并行生成多个扩展查询以揭示请求的替代可行具体化方案。这些查询随后生成候选可视化脚本,并通过执行产生多样化的图像。通过评估每个输出的视觉质量与正确性,VisPath生成针对性反馈,并聚合这些反馈以合成最优最终结果。在MatPlotBench和Qwen-Agent代码解释器基准上的大量实验表明,VisPath优于现有最先进方法,为人工智能驱动的可视化生成提供了一个更可靠的框架。