A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it needs the capability to both recognize the design elements and understand the design. With the rapid development of Multimodal Large Language Models (MLLMs), we establish the DesignProbe, a benchmark to investigate the capability of MLLMs in design. Our benchmark includes eight tasks in total, across both the fine-grained element level and the overall design level. At design element level, we consider both the attribute recognition and semantic understanding tasks. At overall design level, we include style and metaphor. 9 MLLMs are tested and we apply GPT-4 as evaluator. Besides, further experiments indicates that refining prompts can enhance the performance of MLLMs. We first rewrite the prompts by different LLMs and found increased performances appear in those who self-refined by their own LLMs. We then add extra task knowledge in two different ways (text descriptions and image examples), finding that adding images boost much more performance over texts.
翻译:一个优秀的图形设计通常在两个层面实现和谐:从细粒度设计元素(色彩、字体与布局)到整体设计。这种复杂性使得图形设计的理解具有挑战性,因为它需要同时具备识别设计元素与理解设计内涵的能力。随着多模态大语言模型的快速发展,我们构建了DesignProbe基准测试集,用于探究MLLMs在图形设计方面的能力。该基准测试共包含八项任务,覆盖细粒度元素层面与整体设计层面。在设计元素层面,我们同时考虑了属性识别与语义理解任务;在整体设计层面,则包含风格与隐喻两类任务。我们对九种MLLMs进行了测试,并采用GPT-4作为评估器。进一步实验表明,优化提示词可提升MLLMs的性能表现。我们首先通过不同LLM重写提示词,发现由自身LLM自我优化的提示词显著提升了性能。随后,我们采用两种不同方式(文本描述与图像示例)添加额外任务知识,发现添加图像带来的性能提升远高于添加文本。