Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale

The recent surge in the research of diffusion models has accelerated the adoption of text-to-image models in various Artificial Intelligence Generated Content (AIGC) commercial products. While these exceptional AIGC products are gaining increasing recognition and sparking enthusiasm among consumers, the questions regarding whether, when, and how these models might unintentionally reinforce existing societal stereotypes remain largely unaddressed. Motivated by recent advancements in language agents, here we introduce a novel agent architecture tailored for stereotype detection in text-to-image models. This versatile agent architecture is capable of accommodating free-form detection tasks and can autonomously invoke various tools to facilitate the entire process, from generating corresponding instructions and images, to detecting stereotypes. We build the stereotype-relevant benchmark based on multiple open-text datasets, and apply this architecture to commercial products and popular open source text-to-image models. We find that these models often display serious stereotypes when it comes to certain prompts about personal characteristics, social cultural context and crime-related aspects. In summary, these empirical findings underscore the pervasive existence of stereotypes across social dimensions, including gender, race, and religion, which not only validate the effectiveness of our proposed approach, but also emphasize the critical necessity of addressing potential ethical risks in the burgeoning realm of AIGC. As AIGC continues its rapid expansion trajectory, with new models and plugins emerging daily in staggering numbers, the challenge lies in the timely detection and mitigation of potential biases within these models.

翻译：扩散模型研究的近期进展加速了文本到图像模型在各类人工智能生成内容（AIGC）商业产品中的应用。尽管这些卓越的AIGC产品正获得越来越多认可并激发消费者热情，但关于这些模型是否、何时以及如何可能无意中强化现有社会刻板印象的问题仍尚未得到充分解答。受语言智能体最新发展的启发，我们提出了一种专为文本到图像模型刻板印象检测设计的新型智能体架构。这种通用智能体架构能够适应自由形式的检测任务，并可自主调用多种工具来促进从生成相应指令与图像到检测刻板印象的完整流程。我们基于多源开放文本数据集构建了刻板印象相关基准测试集，并将该架构应用于商业产品及主流开源文本到图像模型。研究发现，当涉及个人特征、社会文化背景及犯罪相关方面的特定提示词时，这些模型常表现出严重刻板印象。综上所述，这些实证发现揭示了刻板印象在性别、种族、宗教等社会维度的普遍存在，这不仅验证了所提方法的有效性，更强调了在蓬勃发展的AIGC领域解决潜在伦理风险的迫切必要性。随着AIGC持续快速扩张——每天涌现数量惊人的新模型与插件——核心挑战在于如何及时检测并缓解这些模型中的潜在偏见。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日