The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.
翻译:近年,文本到图像生成系统在照片级逼真效果上取得了前所未有的成就,且其作为即插即用的内容创作解决方案应用日益广泛,因此理解其潜在偏见至关重要。本文提出了三个指标,用于在要求生成世界各地物体时评估文本到图像生成系统的逼真度、多样性和提示-生成一致性。这些指标通过实现对地理差异的自动高效基准测试,补充了对这类系统更广泛影响的定性分析,是构建负责任视觉内容创作系统的重要一步。我们利用所提指标分析了最先进视觉内容创作系统中的潜在地理偏见,发现:(1)当提示欧洲时,模型生成的逼真度和多样性高于提示非洲和西亚;(2)加入地理信息提示会损害生成图像的提示一致性和多样性;(3)模型在某些物体上表现出比另一些物体更显著的区域级差异。或许最有趣的是,我们的指标表明,图像生成质量的进步是以牺牲现实世界地理表现为代价的。我们的全面评估为确保每个人都获得积极的视觉内容创作体验迈出了关键一步。