Large language models (LLMs) have demonstrated remarkable capabilities in simulating human behaviour and social intelligence. However, they risk perpetuating societal biases, especially when demographic information is involved. We introduce a novel framework using cosine distance to measure semantic shifts in responses and an LLM-judged Preference Win Rate (WR) to assess how demographic prompts affect response quality across power-disparate social scenarios. Evaluating five LLMs over 100 diverse social scenarios and nine demographic axes, our findings suggest a "default persona" bias toward middle-aged, able-bodied, native-born, Caucasian, atheistic males with centrist views. Moreover, interactions involving specific demographics are associated with lower-quality responses. Lastly, the presence of power disparities increases variability in response semantics and quality across demographic groups, suggesting that implicit biases may be heightened under power-imbalanced conditions. These insights expose the demographic biases inherent in LLMs and offer potential paths toward future bias mitigation efforts in LLMs.
翻译:大型语言模型(LLM)在模拟人类行为和社会智能方面展现出卓越能力。然而,当涉及人口统计信息时,它们存在延续社会偏见的风险。我们引入一种新颖框架,利用余弦距离测量响应中的语义偏移,并采用LLM评判的偏好胜率(WR)来评估人口统计提示如何影响权力不对等社会场景中的响应质量。通过对五个LLM在100个多样化社会场景和九个人口统计维度上的评估,我们的研究结果表明存在一种“默认角色”偏见,倾向于中年、健全、本土出生、高加索人种、无神论且持中立观点的男性。此外,涉及特定人口群体的交互往往与较低质量的响应相关联。最后,权力差异的存在增加了不同人口群体间响应语义和质量的变异性,表明隐性偏见可能在权力不平衡条件下被放大。这些发现揭示了LLM中固有的人口统计偏见,并为未来LLM的偏见缓解工作提供了潜在路径。