Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.
翻译:人类道德判断具有情境依赖性,并受到人际关系的调节。随着大型语言模型日益充当决策支持系统,确定它们是否编码了这些社会细微差别至关重要。我们通过改变两个实验维度——犯罪严重性和关系亲密度——利用告密者困境来表征机器行为。我们的研究评估了三种不同的视角:(1)道德正确性(规范性规范),(2)预测的人类行为(描述性社会期望),以及(3)自主模型决策。通过分析推理过程,我们识别出清晰的跨视角分歧:虽然道德正确性始终以公平为导向,预测的人类行为则随着关系亲密度的增加显著转向忠诚。关键的是,模型决策与道德正确性判断一致,而非其自身的预测行为。这种不一致表明,大语言模型的决策优先考虑僵化的规范性规则,而非其内部世界模型中存在的社会敏感性,这构成了一个可能导致实际部署中出现显著错配的差距。