We use a large-scale experiment (N=8000) to determine whether GPT-4 can replicate cross-cultural differences in the Big Five, measured using the Ten-Item Personality Inventory. We used the US and South Korea as the cultural pair, given that prior research suggests substantial personality differences between people from these two countries. We manipulated the target of the simulation (US vs. Korean), the language of the inventory (English vs. Korean), and the language model (GPT-4 vs. GPT-3.5). Our results show that GPT-4 replicated the cross-cultural differences for each factor. However, mean ratings had an upward bias and exhibited lower variation than in the human samples, as well as lower structural validity. We provide preliminary evidence that LLMs can aid cross-cultural researchers and practitioners.
翻译:我们通过一项大规模实验(N=8000)来确定GPT-4是否能够复现大五人格特质中存在的跨文化差异,该差异通过十项人格量表进行测量。我们选择美国和韩国作为文化对比组,因为先前研究表明这两个国家的人群存在显著的人格差异。我们操控了模拟的目标对象(美国 vs. 韩国)、量表的语言(英语 vs. 韩语)以及所使用的语言模型(GPT-4 vs. GPT-3.5)。我们的结果表明,GPT-4成功复现了每个大五人格因子上的跨文化差异。然而,其给出的平均评分存在向上偏差,并且相较于人类样本表现出更低的变异度以及较低的结构效度。我们提供了初步证据,表明大语言模型能够辅助跨文化研究者和从业者。