理性火花：推理型大语言模型是否与人类判断和选择相一致？ (Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?)

Large Language Models (LLMs) are increasingly positioned as decision engines for hiring, healthcare, and economic judgment, yet real-world human judgment reflects a balance between rational deliberation and emotion-driven bias. If LLMs are to participate in high-stakes decisions or serve as models of human behavior, it is critical to assess whether they exhibit analogous patterns of (ir)rationalities and biases. To this end, we evaluate multiple LLM families on (i) benchmarks testing core axioms of rational choice and (ii) classic decision domains from behavioral economics and social norms where emotions are known to shape judgment and choice. Across settings, we show that deliberate "thinking" reliably improves rationality and pushes models toward expected-value maximization. To probe human-like affective distortions and their interaction with reasoning, we use two emotion-steering methods: in-context priming (ICP) and representation-level steering (RLS). ICP induces strong directional shifts that are often extreme and difficult to calibrate, whereas RLS produces more psychologically plausible patterns but with lower reliability. Our results suggest that the same mechanisms that improve rationality also amplify sensitivity to affective interventions, and that different steering methods trade off controllability against human-aligned behavior. Overall, this points to a tension between reasoning and affective steering, with implications for both human simulation and the safe deployment of LLM-based decision systems.

翻译：大语言模型（LLMs）正日益被定位为招聘、医疗保健和经济判断等领域的决策引擎，然而现实世界中的人类判断反映了理性审慎与情感驱动偏见之间的平衡。如果LLMs要参与高风险决策或作为人类行为的模型，评估它们是否表现出类似的（非）理性模式与偏见至关重要。为此，我们在以下两方面评估了多个LLM系列：（i）测试理性选择核心公理的基准，以及（ii）行为经济学和社会规范中已知情感会影响判断与选择的经典决策领域。在所有设定中，我们发现审慎的“思考”能可靠地提升理性，并将模型推向期望价值最大化。为了探究类人的情感扭曲及其与推理的相互作用，我们使用了两种情感引导方法：上下文提示（ICP）和表示层引导（RLS）。ICP会引发强烈的方向性偏移，这种偏移常常极端且难以校准；而RLS则产生更具心理真实性的模式，但可靠性较低。我们的结果表明，提升理性的相同机制也放大了对情感干预的敏感性，并且不同的引导方法在可控性与人类对齐行为之间存在权衡。总体而言，这指向了推理与情感引导之间的张力，对人类行为模拟以及基于LLM的决策系统的安全部署均具有启示意义。