Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it remains unclear whether they and their vision-language counterparts (VLMs) can reason about and enforce visualization principles directly. Constraint based systems encode these principles as logical rules for precise automated checks, but translating them into formal specifications demands expert knowledge. This motivates leveraging LLMs and VLMs as principle checkers that can reason about visual design directly, bypassing the need for symbolic rule specification. In this paper, we present the first systematic evaluation of both LLMs and VLMs on their ability to reason about visualization principles, using hard verification ground truth derived from Answer Set Programming (ASP). We compiled a set of visualization principles expressed as natural-language statements and generated a controlled dataset of approximately 2,000 Vega-Lite specifications annotated with explicit principle violations, complemented by over 300 real-world Vega-Lite charts. We evaluated both checking and fixing tasks, assessing how well models detect principle violations and correct flawed chart specifications. Our work highlights both the promise of large (vision-)language models as flexible validators and editors of visualization designs and the persistent gap with symbolic solvers on more nuanced aspects of visual perception. They also reveal an interesting asymmetry: frontier models tend to be more effective at correcting violations than at detecting them reliably.
翻译:数据可视化原理源于数十年设计与感知研究,确保视觉传达的准确性。尽管先前研究表明大型语言模型(LLMs)能够生成图表或识别误导性图形,但这些模型及其视觉语言模型(VLM)变体是否能直接推理并应用可视化原理仍不明确。基于约束的系统将这些原理编码为逻辑规则以实现精确的自动化检查,但将其转化为形式化规范需要专业知识。这促使我们利用LLMs和VLMs作为原理检查器,使其能直接推理视觉设计,从而绕过符号化规则定义的需求。本文首次通过基于答案集编程(ASP)的严格验证基准,系统评估了LLMs和VLMs在可视化原理推理方面的能力。我们收集了以自然语言陈述的可视化原理集,构建了包含约2,000个标注显式原理违反的Vega-Lite规范受控数据集,并补充了300余个真实场景的Vega-Lite图表。我们评估了检查与修正两项任务,衡量模型检测原理违反及修正缺陷图表规范的能力。本研究既揭示了大型(视觉)语言模型作为可视化设计灵活验证器与编辑器的潜力,也指出了其在视觉感知细微层面与符号求解器间的持续差距。同时发现了一个有趣的不对称现象:前沿模型在修正违反行为方面往往比可靠检测违反行为表现更佳。