The recent surge in the research of diffusion models has accelerated the adoption of text-to-image models in various Artificial Intelligence Generated Content (AIGC) commercial products. While these exceptional AIGC products are gaining increasing recognition and sparking enthusiasm among consumers, the questions regarding whether, when, and how these models might unintentionally reinforce existing societal stereotypes remain largely unaddressed. Motivated by recent advancements in language agents, here we introduce a novel agent architecture tailored for stereotype detection in text-to-image models. This versatile agent architecture is capable of accommodating free-form detection tasks and can autonomously invoke various tools to facilitate the entire process, from generating corresponding instructions and images, to detecting stereotypes. We build the stereotype-relevant benchmark based on multiple open-text datasets, and apply this architecture to commercial products and popular open source text-to-image models. We find that these models often display serious stereotypes when it comes to certain prompts about personal characteristics, social cultural context and crime-related aspects. In summary, these empirical findings underscore the pervasive existence of stereotypes across social dimensions, including gender, race, and religion, which not only validate the effectiveness of our proposed approach, but also emphasize the critical necessity of addressing potential ethical risks in the burgeoning realm of AIGC. As AIGC continues its rapid expansion trajectory, with new models and plugins emerging daily in staggering numbers, the challenge lies in the timely detection and mitigation of potential biases within these models.
翻译:扩散模型研究的近期热潮加速了文本到图像模型在各种人工智能生成内容(AIGC)商业产品中的应用。尽管这些卓越的AIGC产品正日益获得认可并激发消费者的热情,但这些模型是否、何时以及如何可能无意中强化现有社会刻板印象的问题,在很大程度上仍未得到解答。受语言智能体最新进展的启发,本文引入了一种新型智能体架构,专门用于文本到图像模型中的刻板印象检测。这种通用智能体架构能够适应自由形式的检测任务,并可自主调用各种工具来促进从生成相应指令和图像到检测刻板印象的整个流程。我们基于多个开放文本数据集构建了刻板印象相关基准,并将此架构应用于商业产品及流行的开源文本到图像模型。我们发现,这些模型在涉及个人特征、社会文化背景和犯罪相关方面的特定提示时,常常表现出严重的刻板印象。总之,这些实证结果强调了刻板印象在性别、种族和宗教等社会维度上的普遍存在,这不仅验证了我们所提出方法的有效性,也凸显了在蓬勃发展的AIGC领域应对潜在伦理风险的至关重要性。随着AIGC持续快速扩张,每天涌现出数量惊人的新模型和插件,挑战在于如何及时检测和缓解这些模型中的潜在偏见。