Recent advancements in foundation models show promising capability in graphic design generation. Several studies have started employing Large Multimodal Models (LMMs) to evaluate graphic designs, assuming that LMMs can properly assess their quality, but it is unclear if the evaluation is reliable. One way to evaluate the quality of graphic design is to assess whether the design adheres to fundamental graphic design principles, which are the designer's common practice. In this paper, we compare the behavior of GPT-based evaluation and heuristic evaluation based on design principles using human annotations collected from 60 subjects. Our experiments reveal that, while GPTs cannot distinguish small details, they have a reasonably good correlation with human annotation and exhibit a similar tendency to heuristic metrics based on design principles, suggesting that they are indeed capable of assessing the quality of graphic design. Our dataset is available at https://cyberagentailab.github.io/Graphic-design-evaluation .
翻译:基础模型的最新进展在平面设计生成方面展现出有前景的能力。已有若干研究开始采用大型多模态模型(LMMs)来评估平面设计,其前提是假设LMMs能够恰当地评估设计质量,但此类评估的可靠性尚不明确。评估平面设计质量的一种方法是检验设计是否遵循基本的平面设计原则,这些原则是设计师的通用实践。本文通过从60名受试者收集的人工标注数据,比较了基于GPT的评估与基于设计原则的启发式评估的行为模式。实验结果表明,尽管GPT无法区分细微细节,但其评估结果与人工标注具有较好的相关性,且表现出与基于设计原则的启发式度量相似的倾向性,这表明GPT确实具备评估平面设计质量的能力。我们的数据集发布于 https://cyberagentailab.github.io/Graphic-design-evaluation 。