Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.
翻译:大型语言模型(LLMs)可能无法公平地呈现关于社会议题的多样化全球视角。本文构建了一个量化框架,以评估模型生成回应与哪些人群的意见更为相似。我们首先构建了GlobalOpinionQA数据集,该数据集基于跨国调查中的问答内容,旨在捕捉不同国家对于全球议题的多样化观点。随后定义了一项指标,用于量化LLM生成的调查回应与人类回应之间基于国家条件约束的相似度。利用该框架,我们针对一个采用宪法式AI(Constitutional AI)训练、具备助益性、诚实性及无害性的LLM开展了三项实验。实验表明,在默认状态下,LLM的回应更倾向于与美国、部分欧洲及南美国家人群的意见相似,凸显了潜在偏见。当提示模型考虑特定国家视角时,其回应虽会向该提示人群的意见偏移,但可能反映有害的文化刻板印象。将GlobalOpinionQA问题翻译为目标语言后,模型回应并不必然变得与相应语言使用者的意见最为相似。我们公开该数据集供学界使用与拓展。数据获取链接为 https://huggingface.co/datasets/Anthropic/llm_global_opinions,交互式可视化工具见 https://llmglobalvalues.anthropic.com。