Diagrams play a central role in research papers for conveying ideas, yet they are often notoriously complex and labor-intensive to create. Although diagrams are presented as images, standard image generative models struggle to produce clear diagrams with well-defined structure. We argue that a promising direction is to generate demonstration diagrams directly in textual form as SVGs, which can leverage recent advances in large language models (LLMs). However, due to the complexity of components and the multimodal nature of diagrams, sufficiently discriminative and explainable metrics for evaluating the quality of LLM-generated diagrams remain lacking. In this paper, we propose DiagramEval, a novel evaluation metric designed to assess demonstration diagrams generated by LLMs. Specifically, DiagramEval conceptualizes diagrams as graphs, treating text elements as nodes and their connections as directed edges, and evaluates diagram quality using two new groups of metrics: node alignment and path alignment. For the first time, we effectively evaluate diagrams produced by state-of-the-art LLMs on recent research literature, quantitatively demonstrating the validity of our metrics. Furthermore, we show how the enhanced explainability of our proposed metrics offers valuable insights into the characteristics of LLM-generated diagrams. Code: https://github.com/ulab-uiuc/diagram-eval.
翻译:图表在科研论文中对于传达思想具有核心作用,然而其制作过程通常极其复杂且耗时费力。尽管图表以图像形式呈现,但标准的图像生成模型难以生成结构清晰、定义明确的图表。我们认为,一个具有前景的方向是直接以文本形式(如SVG)生成演示图表,这可以充分利用大语言模型(LLMs)的最新进展。然而,由于图表组件的复杂性和多模态特性,目前仍缺乏足够区分性和可解释性的指标来评估LLM生成图表的质量。本文提出DiagramEval,一种新颖的评估指标,专门用于评估LLM生成的演示图表。具体而言,DiagramEval将图表概念化为图结构,将文本元素视为节点,将其连接关系视为有向边,并通过两组新指标评估图表质量:节点对齐度与路径对齐度。我们首次对前沿LLM基于近期研究文献生成的图表进行了有效评估,定量验证了所提指标的有效性。此外,我们展示了所提指标增强的可解释性如何为理解LLM生成图表的特性提供重要见解。代码:https://github.com/ulab-uiuc/diagram-eval。