In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. Code and data, including the dataset annotated with defective areas, are available at \href{https://github.com/cure-lab/EvaluateAIGC}{https://github.com/cure-lab/EvaluateAIGC}.
翻译:本文提出了一项实证研究,引入了一个针对文本到图像生成模型在人体图像合成任务中的细致评估框架。该框架将评估分为两个不同的类别:第一类聚焦于图像质量,如美学性和真实感;第二类通过概念覆盖度和公平性来考察文本条件。我们引入了一种创新的美学评分预测模型,用于评估生成图像的视觉吸引力,并发布了首个标注了生成人体图像中低质量区域的数据集,以促进自动缺陷检测。我们对概念覆盖度的探究旨在检验模型准确解释和渲染基于文本概念的有效性,而对公平性的分析则揭示了模型输出中存在的偏见,重点关注性别、种族和年龄。虽然本研究以人体图像为基础,但这种双管齐下的方法设计灵活,可适用于其他形式的图像生成,从而增进对生成模型的理解,并为开发下一代更复杂、更具情境感知能力且更符合伦理的生成模型铺平道路。代码和数据,包括标注缺陷区域的数据集,可在 \href{https://github.com/cure-lab/EvaluateAIGC}{https://github.com/cure-lab/EvaluateAIGC} 获取。