This paper examines how Large Language Models (LLMs) reproduce societal norms, particularly heterocisnormativity, and how these norms translate into measurable biases in their text generations. We investigate whether explicit information about a subject's gender or sexuality influences LLM responses across three subject categories: queer-marked, non-queer-marked, and the normalized "unmarked" category. Representational imbalances are operationalized as measurable differences in English sentence completions across four dimensions: sentiment, regard, toxicity, and prediction diversity. Our findings show that Masked Language Models (MLMs) produce the least favorable sentiment, higher toxicity, and more negative regard for queer-marked subjects. Autoregressive Language Models (ARLMs) partially mitigate these patterns, while closed-access ARLMs tend to produce more harmful outputs for unmarked subjects. Results suggest that LLMs reproduce normative social assumptions, though the form and degree of bias depend strongly on specific model characteristics, which may redistribute, but not eliminate, representational harms.
翻译:本文研究大型语言模型(LLMs)如何再现社会规范(特别是异性恋顺性别规范),以及这些规范如何转化为文本生成中可测量的偏见。我们通过三个主体类别(酷儿标记、非酷儿标记以及规范化的“未标记”类别),探究关于主体性别或性取向的显式信息是否会影响LLM的响应。表征不平衡被操作化为英语句子补全在四个维度上的可测量差异:情感倾向、尊重度、毒性及预测多样性。研究发现,掩码语言模型(MLMs)对酷儿标记主体生成的情感倾向最不利、毒性更高且尊重度更负面。自回归语言模型(ARLMs)部分缓解了这些模式,而闭源自回归模型倾向于对未标记主体产生更有害的输出。结果表明,LLMs再现了规范化的社会假设,但偏见的形式与程度高度依赖于具体模型特性,这可能重新分配而非消除表征性危害。