A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual variations known from moral psychology to shift human judgment: consequentialist, emotional, and relational. Evaluating 22 LLMs, we find that nearly all models are context-sensitive, shifting their judgments toward rule-violating behavior. Comparing with a human survey, we find that models and humans are most triggered by different contextual variations, and that a model aligned with human judgments in the base case is not necessarily aligned in its contextual sensitivity. This raises the question of controlling contextual sensitivity, which we address with an activation steering approach that can reliably increase or decrease a model's contextual sensitivity.
翻译:人类道德决策高度依赖于情境。然而,关于大语言模型道德性的研究大多局限于固定场景。我们通过引入“情境道德选择”数据集填补这一空白,该数据集包含系统性的情境变化——这些变化源自道德心理学中被确认能改变人类判断的后果主义、情感和关系因素。对22个大语言模型的评估表明,几乎所有模型都具有情境敏感性,其判断会向违反规则的行为偏移。通过与人类调查对比发现,模型与人类对不同情境变化的敏感触发点存在差异,且基例中与人类判断对齐的模型未必在情境敏感性上保持对齐。这引发了控制情境敏感性的问题,我们采用激活操控方法可可靠地增强或降低模型的情境敏感性。