MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

Yu Ying Chiu,Michael S. Lee,Rachel Calcott,Brandon Handoko,Paul de Font-Reaulx,Raphaël Millière,Paula Rodriguez,Chen Bo Calvin Zhang,Ziwen Han,Udari Madhushani Sehwag,Yash Maurya,Christina Q Knight,Harry R. Lloyd,Florence Bacus,Conor Downey,Mantas Mazeika,Bing Liu,Yejin Choi,Mitchell L Gordon,Sydney Levine

from arxiv, 46 pages, 8 figures, 10 tables. Published in ICLR 2026. Accepted at CHAI workshop and SPP 2026 (non-archival)

As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.

翻译：随着AI系统的发展，我们越来越依赖它们与我们共同做出决策，或为我们做出决策。为确保此类决策符合人类价值观，我们不仅需要理解它们做出了什么决策，还需理解它们是如何做出这些决策的。能够提供最终响应及（部分透明的）中间思考轨迹的推理语言模型，为研究AI程序性推理提供了及时契机。与数学和编程问题（通常存在客观正确结果）不同，道德困境是过程导向评估的绝佳试验场，因其允许多种有依据的结论。为此，我们提出MoReBench：包含1000个道德场景，每个场景配有一组评分标准，这些标准由专家认为在推理该场景时必须纳入（或避免）的核心要素组成。MoReBench涵盖超过2.3万条标准，包括识别道德考量、权衡利弊、给出可操作建议，以覆盖AI为人类提供道德建议及自主做出道德决策的情形。此外，我们专门整理了MoReBench-Theory：150个实例，用于测试AI能否在规范伦理学五大主要框架下进行推理。研究结果表明，规模法则及数学、编程和科学推理任务上的现有基准无法预测模型进行道德推理的能力。模型还对特定道德框架（如边沁式行动功利主义与康德道义论）表现出偏好，这可能是主流训练范式的副作用。综上，这些基准推动了面向更安全、更透明AI的过程导向推理评估。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

大语言模型的智能体化推理

专知会员服务

35+阅读 · 1月21日

【博士论文】《自然语言处理中的因果推理》

专知会员服务

25+阅读 · 2025年4月25日

大语言模型推理前沿综述：推理扩展、推理学习与智能体系统

专知会员服务

39+阅读 · 2025年4月20日

大型语言模型推理前沿综述：推理扩展、学习推理与自主智能系统

专知会员服务

38+阅读 · 2025年4月7日