Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this "laissez-faire" setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.

翻译：生成式语言模型（LM）的快速部署引发了人们对社会偏见影响多元消费者福祉的担忧。现有关于生成式LM的文献主要通过显式身份提示来考察偏见。然而，早期基于语言的技术平台（包括搜索引擎）中的偏见研究表明，即使在未明确指定身份术语的情况下，歧视也可能发生。针对开放提示（其中身份分类未指定）的LM回应中的偏见研究较为匮乏，且尚未立足于终端消费者的伤害。本文通过开放提示考虑更广泛的自然使用场景，推进了生成式LM偏见研究。在这种"自由放任"设定下，我们发现五种最广泛使用的LM（ChatGPT3.5、ChatGPT4、Claude2.0、Llama2和PaLM2）生成的合成文本对具有交叉种族、性别和/或性取向身份的少数群体（美洲印第安人/阿拉斯加原住民、亚裔、黑人、拉丁裔、中东/北非人、夏威夷原住民/太平洋岛民、女性、非二元性别人士、酷儿）延续了遗漏、从属和刻板印象的伤害。我们发现了广泛的偏见证据，其程度达到这些个体遭遇的LM输出中将其身份以从属方式描绘的可能性，相比代表性或赋权性描绘高出数百到数千倍。我们还记录了LM输出中刻板印象（如永久的外国人）的普遍存在，这些刻板印象已知会引发心理伤害，对少数群体造成不成比例的影响。这些伤害包括刻板印象威胁，会导致认知表现受损和负面自我认知增加。我们的发现凸显了保护消费者免受语言模型造成的歧视性伤害、投资于旨在赋能多元消费者的关键人工智能教育项目的紧迫性。