Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data. We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society. The framework is grounded in the Stereotype Content Model (SCM); a well-established theory from psychology. According to SCM, stereotypes are not all alike. Instead, the dimensions of Warmth and Competence serve as the factors that delineate the nature of stereotypes. Based on the SCM theory, StereoMap maps LLMs' perceptions of social groups (defined by socio-demographic features) using the dimensions of Warmth and Competence. Furthermore, the framework enables the investigation of keywords and verbalizations of reasoning of LLMs' judgments to uncover underlying factors influencing their perceptions. Our results show that LLMs exhibit a diverse range of perceptions towards these groups, characterized by mixed evaluations along the dimensions of Warmth and Competence. Furthermore, analyzing the reasonings of LLMs, our findings indicate that LLMs demonstrate an awareness of social disparities, often stating statistical data and research findings to support their reasoning. This study contributes to the understanding of how LLMs perceive and represent social groups, shedding light on their potential biases and the perpetuation of harmful associations.
翻译:摘要:大型语言模型(LLMs)已被观察到会编码并延续训练数据中存在的有害关联。我们提出了一个理论严谨的框架StereoMap,以深入理解这些模型如何感知社会对不同人口群体的看法。该框架基于心理学的成熟理论——刻板印象内容模型(SCM)。根据SCM,刻板印象并非千篇一律,而是由“温暖”和“能力”两个维度界定其本质。基于SCM理论,StereoMap利用“温暖”和“能力”维度映射LLMs对社会群体(按社会人口特征定义)的感知。此外,该框架支持探究关键词及LLMs判断的推理表述,以揭示影响其感知的潜在因素。我们的结果表明,LLMs对这些群体展现出多样化的感知特征,表现为在“温暖”与“能力”维度上的混合评价。进一步分析LLMs的推理过程发现,它们对社会差异具有意识,常引用统计数据和研究结果来支持其推理。本研究有助于理解LLMs如何感知和表征社会群体,揭示其潜在偏见及有害关联的延续机制。