We use a large-scale experiment (N=8000) to determine whether GPT-4 can replicate cross-cultural differences in the Big Five, measured using the Ten-Item Personality Inventory. We used the US and South Korea as the cultural pair, given that prior research suggests substantial personality differences between people from these two countries. We manipulated the target of the simulation (US vs. Korean), the language of the inventory (English vs. Korean), and the language model (GPT-4 vs. GPT-3.5). Our results show that GPT-4 replicated the cross-cultural differences for each factor. However, mean ratings had an upward bias and exhibited lower variation than in the human samples, as well as lower structural validity. Overall, we provide preliminary evidence that LLMs can aid cross-cultural psychological research.
翻译:我们通过一项大规模实验(N=8000)探究GPT-4能否复现采用十项人格量表测量的大五人格跨文化差异。基于前人研究表明美国与韩国人群存在显著人格差异,我们选取美韩作为文化对比组。实验操纵了模拟目标(美国vs.韩国)、量表语言(英语vs.韩语)及语言模型(GPT-4 vs. GPT-3.5)三个变量。结果显示,GPT-4成功复现了各人格维度的跨文化差异,但其平均评分存在正向偏差,变异程度低于人类样本,同时结构效度较低。总体而言,本研究为大型语言模型辅助跨文化心理学研究提供了初步证据。