Generating Statistical Charts with Validation-Driven LLM Workflows

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow's utility for diagnostic studies of chart-grounded multimodal reasoning.

翻译：摘要：从表格数据生成多样化、可读性强的统计图表对大型语言模型（LLM）而言仍具挑战性，因为许多失败案例在图表渲染后才显现，且无法仅通过数据或代码检测。现有图表数据集也极少提供完整对齐的制品（如可执行代码、数据集上下文及问答对）。我们提出一种结构化LLM工作流，将图表生成分解为数据集筛选、绘图方案提案、代码合成、渲染、验证驱动式优化、描述生成及问答对生成等环节。通过引入渲染输出验证机制，该工作流可处理可视化特有的失败模式（如可读性问题和语义不匹配）。它将图表生成视为可审查过程而非一次性提示生成代码任务，并为每个图表保留其代码、数据集上下文、描述及问答对。将该工作流应用于UCI数据集后，我们从74个数据集生成了1,500个图表，涵盖24种图表族，并配对了30,003个问答对。我们使用这些图表-问答对评估了16个多模态LLM（MLLM）。结果表明：图表语法类问题几乎已达成饱和，而数值提取、比较与推理类问题仍更具挑战性，这揭示了该工作流在图表模态多模态推理诊断研究中的实用价值。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

23+阅读 · 3月30日

大型语言模型（LLM）赋能的知识图谱构建：综述

专知会员服务

56+阅读 · 2025年10月24日

LLM/智能体作为数据分析师：综述

专知会员服务

38+阅读 · 2025年9月30日

【ACL2025教程】LLM时代的合成数据，228页slides

专知会员服务

31+阅读 · 2025年7月30日