The recent surge in the research of diffusion models has accelerated the adoption of text-to-image models in various Artificial Intelligence Generated Content (AIGC) commercial products. While these exceptional AIGC products are gaining increasing recognition and sparking enthusiasm among consumers, the questions regarding whether, when, and how these models might unintentionally reinforce existing societal stereotypes remain largely unaddressed. Motivated by recent advancements in language agents, here we introduce a novel agent architecture tailored for stereotype detection in text-to-image models. This versatile agent architecture is capable of accommodating free-form detection tasks and can autonomously invoke various tools to facilitate the entire process, from generating corresponding instructions and images, to detecting stereotypes. We build the stereotype-relevant benchmark based on multiple open-text datasets, and apply this architecture to commercial products and popular open source text-to-image models. We find that these models often display serious stereotypes when it comes to certain prompts about personal characteristics, social cultural context and crime-related aspects. In summary, these empirical findings underscore the pervasive existence of stereotypes across social dimensions, including gender, race, and religion, which not only validate the effectiveness of our proposed approach, but also emphasize the critical necessity of addressing potential ethical risks in the burgeoning realm of AIGC. As AIGC continues its rapid expansion trajectory, with new models and plugins emerging daily in staggering numbers, the challenge lies in the timely detection and mitigation of potential biases within these models.
翻译:扩散模型研究的近期进展加速了文本到图像模型在各类人工智能生成内容(AIGC)商业产品中的应用。尽管这些卓越的AIGC产品正获得越来越多认可并激发消费者热情,但关于这些模型是否、何时以及如何可能无意中强化现有社会刻板印象的问题仍尚未得到充分解答。受语言智能体最新发展的启发,我们提出了一种专为文本到图像模型刻板印象检测设计的新型智能体架构。这种通用智能体架构能够适应自由形式的检测任务,并可自主调用多种工具来促进从生成相应指令与图像到检测刻板印象的完整流程。我们基于多源开放文本数据集构建了刻板印象相关基准测试集,并将该架构应用于商业产品及主流开源文本到图像模型。研究发现,当涉及个人特征、社会文化背景及犯罪相关方面的特定提示词时,这些模型常表现出严重刻板印象。综上所述,这些实证发现揭示了刻板印象在性别、种族、宗教等社会维度的普遍存在,这不仅验证了所提方法的有效性,更强调了在蓬勃发展的AIGC领域解决潜在伦理风险的迫切必要性。随着AIGC持续快速扩张——每天涌现数量惊人的新模型与插件——核心挑战在于如何及时检测并缓解这些模型中的潜在偏见。