We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image understanding or generic text-to-image synthesis, GDB targets the unique challenges of professional design work: translating communicative intent into structured layouts, rendering typographically faithful text, manipulating layered compositions, producing valid vector graphics, and reasoning about animation. The suite comprises 50 tasks organized along five axes: layout, typography, infographics, template & design semantics and animation, each evaluated under both understanding and generation settings, and grounded in real-world design templates drawn from the LICA layered-composition dataset. We evaluate a set of frontier closed-source models using a standardized metric taxonomy covering spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Our results reveal that current models fall short on the core challenges of professional design: spatial reasoning over complex layouts, faithful vector code generation, fine-grained typographic perception, and temporal decomposition of animations remain largely unsolved. While high-level semantic understanding is within reach, the gap widens sharply as tasks demand precision, structure, and compositional awareness. GDB provides a rigorous, reproducible testbed for tracking progress toward AI systems that can function as capable design collaborators. The full evaluation framework is publicly available.
翻译:我们提出GraphicDesignBench(GDB),这是首个专门针对AI模型在专业图形设计全流程任务评估而设计的综合基准套件。与现有聚焦自然图像理解或通用文生图合成的基准不同,GDB瞄准专业设计工作的独特挑战:将沟通意图转化为结构化布局、呈现忠实排版的文字、处理分层合成、生成有效矢量图形以及进行动画推理。该套件包含按五个维度组织的50项任务:布局、排版、信息图、模板与设计语义及动画,每项任务均在理解与生成两种设置下进行评估,并基于从LICA分层合成数据集提取的真实设计模板。我们使用涵盖空间精度、感知质量、文字保真度、语义对齐及结构有效性的标准化度量分类法,对一系列前沿闭源模型进行评估。结果表明,当前模型在专业设计的核心挑战上表现不足:复杂布局的空间推理、忠实矢量代码生成、细粒度排版感知及动画的时间分解仍基本未解决。尽管高层语义理解已可触及,但随着任务要求更精确、更具结构性和组合意识,差距急剧扩大。GDB为追踪AI系统向胜任设计协作者能力的进展提供了严谨可复现的测试平台。完整评估框架已公开。