As large language models (LLMs) are deployed in more and more real-world situations, it is crucial to understand their decision-making when faced with moral dilemmas. Inspired by a large-scale cross-cultural study of human moral preferences, "The Moral Machine Experiment", we set up the same set of moral choices for LLMs. We translate 1K vignettes of moral dilemmas, parametrically varied across key axes, into 100+ languages, and reveal the preferences of LLMs in each of these languages. We then compare the responses of LLMs to that of human speakers of those languages, harnessing a dataset of 40 million human moral judgments. We discover that LLMs are more aligned with human preferences in languages such as English, Korean, Hungarian, and Chinese, but less aligned in languages such as Hindi and Somali (in Africa). Moreover, we characterize the explanations LLMs give for their moral choices and find that fairness is the most dominant supporting reason behind GPT-4's decisions and utilitarianism by GPT-3. We also discover "language inequality" (which we define as the model's different development levels in different languages) in a series of meta-properties of moral decision making.
翻译:随着大型语言模型(LLMs)在现实世界中的应用日益广泛,理解其在面对道德困境时的决策过程变得至关重要。受一项大规模跨文化人类道德偏好研究——“道德机器实验”的启发,我们为LLMs设置了同一组道德选择。我们将1000个参数化关键维度变化的道德困境小场景翻译成100多种语言,并揭示了LLMs在每种语言中的偏好。随后,我们利用包含4000万条人类道德判断的数据集,将LLMs的回应与相应语言使用者的人类回应进行比较。我们发现,LLMs在英语、韩语、匈牙利语和中文等语言中与人类偏好更为一致,而在印地语和索马里语(非洲)等语言中一致性较低。此外,我们分析了LLMs对其道德选择给出的解释,发现公平性是GPT-4决策背后最主要的支持理由,而GPT-3则更倾向于功利主义。我们还在道德决策的一系列元属性中发现了“语言不平等”(我们将其定义为模型在不同语言中的发展水平差异)。