As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models (GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium) across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31 percentage points) and Black (-9 pp) workers are consistently underrepresented, while Hispanic (+17 pp) and Asian (+12 pp) workers are overrepresented, with stereotype exaggeration amplifying existing occupational segregation. These distortions are often extreme, including near-total portrayals of housekeepers as Hispanic and the near-erasure of Black workers from many occupations. Because these patterns recur across models with different institutional and cultural origins, they suggest shared structural sources of bias rather than model-specific artifacts. We argue that auditing generative AI requires evaluation frameworks that examine how synthetic populations systematically reshape demographic visibility across social roles.
翻译:随着生成式人工智能工具越来越广泛地被用于描绘职业角色中的人物形象,理解它们在种族和性别方面的表征偏差至关重要。我们对四个主要大语言模型(GPT-4、Gemini 2.5、DeepSeek V3.1 和 Mistral-medium)生成的超过150万个职业角色进行了审计,这些角色覆盖了美国41个职业。将这些角色与美国劳工统计局(BLS)的数据进行对比,我们发现模型生成的人口统计特征相比现实世界数据变化更小,实际上是将每个职业压缩成了主导人口统计特征,而非反映人口层面的多样性。通过偏移/夸大分解法,我们揭示了这些失真的结构:白人员工(-31个百分点)和黑人员工(-9个百分点)被系统性低估,而西班牙裔员工(+17个百分点)和亚裔员工(+12个百分点)被高估,刻板印象的夸大进一步加剧了现存的职业隔离。这些失真往往非常极端,例如几乎将所有家政人员描绘为西班牙裔,并且在许多职业中几乎抹去了黑人员工的形象。由于这些模式在不同制度和文化起源的模型中反复出现,这表明偏差的来源是共同的结构性因素,而非模型特有的伪影。我们主张,对生成式人工智能的审计需要采用评估框架,考察合成群体如何系统性地重塑不同社会角色中的人口统计可见性。