Real-world visualization tasks involve complex, multi-modal requirements that extend beyond simple text-to-chart generation, requiring reference images, code examples, and iterative refinement. Current systems exhibit fundamental limitations: single-modality input, one-shot generation, and rigid workflows. While LLM-based approaches show potential for these complex requirements, they introduce reliability challenges including catastrophic failures and infinite loop susceptibility. To address this gap, we propose MultiVis-Agent, a logic rule-enhanced multi-agent framework for reliable multi-modal and multi-scenario visualization generation. Our approach introduces a four-layer logic rule framework that provides mathematical guarantees for system reliability while maintaining flexibility. Unlike traditional rule-based systems, our logic rules are mathematical constraints that guide LLM reasoning rather than replacing it. We formalize the MultiVis task spanning four scenarios from basic generation to iterative refinement, and develop MultiVis-Bench, a benchmark with over 1,000 cases for multi-modal visualization evaluation. Extensive experiments demonstrate that our approach achieves 75.63% visualization score on challenging tasks, significantly outperforming baselines (57.54-62.79%), with task completion rates of 99.58% and code execution success rates of 94.56% (vs. 74.48% and 65.10% without logic rules), successfully addressing both complexity and reliability challenges in automated visualization generation.
翻译:现实世界中的可视化任务涉及复杂、多模态的需求,其范围超越了简单的文本到图表生成,通常需要参考图像、代码示例以及迭代优化。现有系统存在根本性局限:单模态输入、一次性生成以及僵化的工作流程。尽管基于大语言模型的方法在处理此类复杂需求方面展现出潜力,但它们也引入了可靠性挑战,包括灾难性故障和无限循环风险。为弥补这一不足,我们提出了MultiVis-Agent,一个基于逻辑规则增强的多智能体框架,用于实现可靠的多模态、多场景可视化生成。我们的方法引入了一个四层逻辑规则框架,在保持灵活性的同时为系统可靠性提供了数学保证。与传统的基于规则的系统不同,我们的逻辑规则是引导大语言模型推理的数学约束,而非替代其推理过程。我们将MultiVis任务形式化为涵盖从基础生成到迭代优化的四个场景,并开发了MultiVis-Bench——一个包含超过1,000个案例的多模态可视化评估基准。大量实验表明,我们的方法在具有挑战性的任务中实现了75.63%的可视化得分,显著优于基线方法(57.54%-62.79%),其任务完成率达到99.58%,代码执行成功率达到94.56%(未使用逻辑规则的对应指标分别为74.48%和65.10%),成功解决了自动化可视化生成中的复杂性与可靠性双重挑战。