In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon.
翻译:本文提出了一项实证研究,引入了针对文本到图像(T2I)生成模型的分层评估框架,并将其应用于人体图像合成。我们的框架将评估分为两个独立维度:其一聚焦图像质量(如美学与逼真度),其二通过概念覆盖度与公平性检验文本条件。我们引入了一种创新的美学评分预测模型,用于评估生成图像的视觉吸引力,并发布了首个标注生成人体图像低质量区域的数据集,以辅助自动化缺陷检测。对概念覆盖度的探索揭示了模型准确解读和呈现文本概念的能力,而对公平性的分析则暴露了模型输出中的性别、种族及年龄偏见。尽管本研究以人体图像为基础,但该双维度方法设计具备灵活扩展性,可适用于其他图像生成任务,从而深化对生成模型的理解,为下一代更复杂、更具上下文感知能力且更符合伦理的生成模型奠定基础。我们即将公开代码、用于评估生成模型的数据集以及标注缺损区域的数据。