To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs. Toward this end, we present Marked Personas, a prompt-based method to measure stereotypes in LLMs for intersectional demographic groups without any lexicon or data labeling. Grounded in the sociolinguistic concept of markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed method is twofold: 1) prompting an LLM to generate personas, i.e., natural language descriptions, of the target demographic group alongside personas of unmarked, default groups; 2) identifying the words that significantly distinguish personas of the target group from corresponding unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. The words distinguishing personas of marked (non-white, non-male) groups reflect patterns of othering and exoticizing these demographics. An intersectional lens further reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women. These representational harms have concerning implications for downstream applications like story generation.
翻译:为了识别并减轻大型语言模型(LLMs)的危害,我们需要理解LLM输出中刻板印象的普遍性与细微差异。为此,我们提出了"标记人物画像"方法——一种基于提示的方法,无需任何词典或数据标注即可测量LLM中对交叉人口群体的刻板印象。该方法基于社会语言学中的标记性概念(该概念区分了显式语言标记的类别与无标记的默认类别),包含两个步骤:1)提示LLM生成目标人口群体的人物画像(即自然语言描述),同时生成无标记默认群体的人物画像;2)识别显著区分目标群体与对应无标记群体人物画像的词汇。我们发现,GPT-3.5和GPT-4生成的人物描绘中,种族刻板印象的出现率高于使用相同提示的人类撰写描绘。区分标记群体(非白人、非男性)人物画像的词汇反映了对这些群体的他者化和异域化模式。交叉视角进一步揭示了主导边缘群体描绘的典型套路,例如热带化倾向和对少数族裔女性的过度性化。这些表征性危害对故事生成等下游应用具有令人担忧的影响。