Recent large-scale T2I models like DALLE-3 have made progress in reducing gender stereotypes when generating single-person images. However, significant biases remain when generating images with more than one person. To systematically evaluate this, we propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities, respectively (e.g. "a CEO" and "an Assistant"). This contrastive setting often triggers T2I models to generate gender-stereotyped images. Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power. Experiments show that over 74% images generated by DALLE-3 display gender-occupational biases. Additionally, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. We further propose FairCritic, a novel and interpretable framework that leverages an LLM-based critic model to i) detect bias in generated images, and ii) adaptively provide feedback to T2I models for improving fairness. FairCritic achieves near-perfect fairness on PST, overcoming the limitations of previous prompt-based intervention approaches.
翻译:近期的大规模文本到图像生成模型(如DALLE-3)在生成单人物图像时,已在减少性别刻板印象方面取得进展。然而,在生成包含多人的图像时,仍存在显著的偏见。为系统评估这一问题,我们提出了成对刻板印象测试框架,该框架通过查询文本到图像生成模型,要求其描绘分别被赋予男性刻板印象与女性刻板印象社会身份的两个人(例如“一位CEO”和“一位助理”)。这种对比性设置常常会触发文本到图像生成模型生成带有性别刻板印象的图像。利用成对刻板印象测试,我们评估了性别偏见的两个方面——广为人知的职业性别偏见以及一个新的方面:组织权力中的偏见。实验表明,DALLE-3生成的图像中超过74%显示出职业性别偏见。此外,与单人物设置相比,DALLE-3在成对刻板印象测试下更倾向于延续与男性相关的刻板印象。我们进一步提出了FairCritic,这是一个新颖且可解释的框架,它利用基于大语言模型的批评模型来:i)检测生成图像中的偏见,以及ii)自适应地向文本到图像生成模型提供反馈以提升公平性。FairCritic在成对刻板印象测试上实现了近乎完美的公平性,克服了以往基于提示的干预方法的局限性。