Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale

The recent surge in the research of diffusion models has accelerated the adoption of text-to-image models in various Artificial Intelligence Generated Content (AIGC) commercial products. While these exceptional AIGC products are gaining increasing recognition and sparking enthusiasm among consumers, the questions regarding whether, when, and how these models might unintentionally reinforce existing societal stereotypes remain largely unaddressed. Motivated by recent advancements in language agents, here we introduce a novel agent architecture tailored for stereotype detection in text-to-image models. This versatile agent architecture is capable of accommodating free-form detection tasks and can autonomously invoke various tools to facilitate the entire process, from generating corresponding instructions and images, to detecting stereotypes. We build the stereotype-relevant benchmark based on multiple open-text datasets, and apply this architecture to commercial products and popular open source text-to-image models. We find that these models often display serious stereotypes when it comes to certain prompts about personal characteristics, social cultural context and crime-related aspects. In summary, these empirical findings underscore the pervasive existence of stereotypes across social dimensions, including gender, race, and religion, which not only validate the effectiveness of our proposed approach, but also emphasize the critical necessity of addressing potential ethical risks in the burgeoning realm of AIGC. As AIGC continues its rapid expansion trajectory, with new models and plugins emerging daily in staggering numbers, the challenge lies in the timely detection and mitigation of potential biases within these models.

翻译：扩散模型研究的近期热潮加速了文本到图像模型在各种人工智能生成内容（AIGC）商业产品中的应用。尽管这些卓越的AIGC产品正日益获得认可并激发消费者的热情，但这些模型是否、何时以及如何可能无意中强化现有社会刻板印象的问题，在很大程度上仍未得到解答。受语言智能体最新进展的启发，本文引入了一种新型智能体架构，专门用于文本到图像模型中的刻板印象检测。这种通用智能体架构能够适应自由形式的检测任务，并可自主调用各种工具来促进从生成相应指令和图像到检测刻板印象的整个流程。我们基于多个开放文本数据集构建了刻板印象相关基准，并将此架构应用于商业产品及流行的开源文本到图像模型。我们发现，这些模型在涉及个人特征、社会文化背景和犯罪相关方面的特定提示时，常常表现出严重的刻板印象。总之，这些实证结果强调了刻板印象在性别、种族和宗教等社会维度上的普遍存在，这不仅验证了我们所提出方法的有效性，也凸显了在蓬勃发展的AIGC领域应对潜在伦理风险的至关重要性。随着AIGC持续快速扩张，每天涌现出数量惊人的新模型和插件，挑战在于如何及时检测和缓解这些模型中的潜在偏见。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日