A Survey on Quality Metrics for Text-to-Image Models

Recent AI-based text-to-image models not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques that offer precise control over scene parameters such as objects, materials, and lighting, when generating realistic images. While the quality of rendered images is traditionally assessed through well-established image quality metrics, such as SSIM or PSNR, the unique challenges presented by text-to-image models, which in contrast to rendering interweave the control of scene and rendering parameters, necessitate the development of novel image quality metrics. Therefore, within this survey, we provide a comprehensive overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences. Based on our findings, we propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality, which ideally map to human preferences. Ultimately, we derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.

翻译：近年来，基于人工智能的文本到图像模型不仅能够生成逼真的图像，还使设计者能够对图像内容进行日益精细的控制。因此，这些方法在计算机图形学研究领域获得了越来越多的关注，而该领域历来致力于研究传统的渲染技术——这些技术在生成逼真图像时能够对物体、材质和光照等场景参数提供精确控制。虽然渲染图像的质量传统上通过SSIM或PSNR等成熟的图像质量度量标准进行评估，但文本到图像模型带来的独特挑战（与渲染技术不同，它将场景参数和渲染参数的控制交织在一起）催生了新型图像质量度量标准的必要性。因此，在本综述中，我们全面梳理了现有的文本到图像质量度量方法，深入探讨了其细微差别以及与人类偏好对齐的需求。基于研究结果，我们提出了一种新的度量标准分类体系，其基本假设是存在两个主要的质量标准：组合性与泛化性，这两者理想情况下应与人类偏好相对应。最后，我们为从事文本到图像评估的实践者制定了指导原则，讨论了评估机制面临的开放挑战，并揭示了当前度量标准的局限性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日