How Utilitarian Are OpenAI's Models Really? Replicating and Reinterpreting Pfeffer, Krügel, and Uhl (2025)

Pfeffer, Krügel, and Uhl (2025) report that OpenAI's reasoning model o1-mini produces more utilitarian responses to the trolley problem and footbridge dilemma than the non-reasoning model GPT-4o. I replicate their study with four current OpenAI models and extend it with prompt variant testing. The trolley finding does not survive: GPT-4o's low utilitarian rate doesn't reflect a deontological commitment but safety refusals triggered by the prompt's advisory framing. When framed as "Is it morally permissible...?" instead of "Should I...?", GPT-4o gives 99% utilitarian responses. All models converge on utilitarian answers when prompt confounds are removed. The footbridge finding survives with blemishes. Reasoning models tend to give more utilitarian responses than non-reasoning models across prompt variations. But often they refuse to answer the dilemma or, when they answer, give a non-utilitarian rather than a utilitarian answer. These results demonstrate that single-prompt evaluations of LLM moral reasoning are unreliable: multi-prompt robustness testing should be standard practice for any empirical claim about LLM behavior.

翻译：Pfeffer、Krügel 与 Uhl（2025）报告称，OpenAI 的推理模型 o1-mini 在电车难题和天桥困境中比非推理模型 GPT-4o 产生了更多的功利主义回应。我使用四个当前版本的 OpenAI 模型复现了他们的研究，并通过提示词变体测试进行了扩展。电车难题的发现未能成立：GPT-4o 的低功利主义回应率并非源于道义论承诺，而是由提示词中咨询性措辞触发的安全拒绝所致。当提示词从“我应该……？”改为“道德上允许……吗？”时，GPT-4o 给出了 99% 的功利主义回应。在消除提示词混淆因素后，所有模型均趋于功利主义答案。天桥困境的发现虽成立但存在瑕疵：在不同提示词变体下，推理模型比非推理模型更倾向于给出功利主义回应；然而，它们时常拒绝回答该困境，或在回答时给出非功利主义而非功利主义的答案。这些结果表明，基于单一提示词评估大语言模型道德推理并不可靠：任何关于大语言模型行为的经验性主张，都应采用多提示词稳健性测试作为标准方法。

相关内容

关注 104

KR是首屈一指的知识表示和推理国际会议。与一般AI会议相比，KR会议系列为研究人员研究由推理算法操纵的知识的显式表示提供了更为私密的环境，这为从事人工智能，计算机科学和软件工程的大量工作提供了重要基础。会议强调了知识表示和推理的理论原理以及这些原理及其在工作系统中的实施方式之间的关系。官网地址：http://dblp.uni-trier.de/db/conf/kr/

OpenAI“开放权重模型”即将进入美军作战体系

专知会员服务

31+阅读 · 2025年11月20日

探究模型能力与应用的进展和边界

专知会员服务

26+阅读 · 2025年8月27日

从o1-mini到DeepSeek-R1，万字长文带你读懂推理模型的历史与技术

专知会员服务

41+阅读 · 2025年2月25日

OpenAI 发布推理模型o3-mini，附37页技术报告，中英文版

专知会员服务

48+阅读 · 2025年2月1日