We introduce the task of text-to-diagram generation, which focuses on creating structured visual representations directly from textual descriptions. Existing approaches in text-to-image and text-to-code generation lack the logical organization and flexibility needed to produce accurate, editable diagrams, often resulting in outputs that are either unstructured or difficult to modify. To address this gap, we introduce DiagramGenBenchmark, a comprehensive evaluation framework encompassing eight distinct diagram categories, including flowcharts, model architecture diagrams, and mind maps. Additionally, we present DiagramAgent, an innovative framework with four core modules-Plan Agent, Code Agent, Check Agent, and Diagram-to-Code Agent-designed to facilitate both the generation and refinement of complex diagrams. Our extensive experiments, which combine objective metrics with human evaluations, demonstrate that DiagramAgent significantly outperforms existing baseline models in terms of accuracy, structural coherence, and modifiability. This work not only establishes a foundational benchmark for the text-to-diagram generation task but also introduces a powerful toolset to advance research and applications in this emerging area.
翻译:我们提出了文本到图表生成这一任务,其核心在于直接从文本描述创建结构化的视觉表示。现有的文本到图像和文本到代码生成方法缺乏生成精确、可编辑图表所需的逻辑组织与灵活性,其输出往往是非结构化的或难以修改。为弥补这一空白,我们引入了DiagramGenBenchmark,这是一个涵盖流程图、模型架构图和思维导图等八种不同图表类别的综合评估框架。此外,我们提出了DiagramAgent,这是一个创新的框架,包含四个核心模块——规划智能体、代码智能体、检查智能体和图表到代码智能体——旨在促进复杂图表的生成与精炼。我们结合客观指标与人工评估的广泛实验表明,DiagramAgent在准确性、结构连贯性和可修改性方面显著优于现有基线模型。这项工作不仅为文本到图表生成任务建立了基础性基准,还引入了一套强大的工具集,以推动这一新兴领域的研究与应用。