Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LLMs to persona role-play, prompting a LLM to assume a specific character. Using the Moral Foundations Questionnaire (MFQ), we introduce a benchmark that quantifies two properties: moral susceptibility and moral robustness, defined from the variability of MFQ scores across and within personas, respectively. We find that, for moral robustness, model family accounts for most of the variance, while model size shows no systematic effect. The Claude family is, by a significant margin, the most robust, followed by Gemini and GPT-4 models, with other families exhibiting lower robustness. In contrast, moral susceptibility exhibits a mild family effect but a clear within-family size effect, with larger variants being more susceptible. Moreover, robustness and susceptibility are positively correlated, an association that is more pronounced at the family level. Additionally, we present moral foundation profiles for models without persona role-play and for personas averaged across models. Together, these analyses provide a systematic view of how persona conditioning shapes moral behavior in LLMs.
翻译:大语言模型(LLMs)日益广泛地应用于社会语境中,这促使我们对其道德判断的表达与变化机制进行分析。本研究探讨了LLMs在角色扮演情境下的道德响应,即通过提示使LLM扮演特定人物角色。基于道德基础问卷(MFQ),我们构建了一个量化两项特性的基准:道德易感性与道德鲁棒性,分别定义为不同角色间与同一角色内MFQ得分的变异程度。研究发现,在道德鲁棒性方面,模型系列是方差的主要来源,而模型规模未呈现系统性影响。Claude系列模型的鲁棒性显著最高,其次为Gemini和GPT-4系列,其他模型系列则表现出较低的鲁棒性。相比之下,道德易感性虽呈现较弱的系列效应,但在同一系列内存在明显的规模效应——更大规模的变体表现出更高的易感性。此外,鲁棒性与易感性呈正相关,这种关联在系列层面更为显著。本研究还提供了未经角色扮演的模型道德基础画像,以及跨模型平均的角色道德画像。这些分析共同揭示了角色设定如何系统性地塑造LLMs的道德行为模式。