Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity.
翻译:近期研究发现,不同地理区域生成图像存在显著差异,包括房屋、汽车等日常物品的刻板描绘。然而,现有差异度量方法仅限于人工评估(耗时且成本高昂)或评估完整图像的自动指标(无法将差异归因于生成图像的特定部分)。本研究提出一套新指标——图像生成差异分解指标(Decomposed-DIG),可分别测量生成图像中物体与背景描绘的地理差异。通过Decomposed-DIG对广泛使用的潜在扩散模型进行审计,发现生成图像中物体的真实感优于背景,且背景往往比物体表现出更大的区域差异。我们运用该指标定位了差异的具体案例,例如非洲背景的刻板生成、非洲现代车辆生成困难,以及某些物体在户外场景中的非现实放置。基于该指标,我们采用新型提示结构,使生成背景多样性的最差区域提升52%,平均提升20%。