Large reasoning models (LRMs) have achieved remarkable success in complex problem-solving, yet they often suffer from computational redundancy or reasoning unfaithfulness. Current methods for shaping LRM behavior typically rely on reinforcement learning or fine-tuning with gold-standard reasoning traces, a paradigm that is both computationally expensive and difficult to scale. In this paper, we reveal that LRMs possess latent \textit{reasoning beliefs} that internally track their own reasoning traits, which can be captured through simple logit probing. Building upon this insight, we propose Reasoning Belief Engineering (RELIEF), a simple yet effective framework that shapes LRM behavior by aligning the model's self-concept with a target belief blueprint. Crucially, RELIEF completely bypasses the need for reasoning-trace supervision. It internalizes desired traits by fine-tuning on synthesized, self-reflective question-answering pairs that affirm the target belief. Extensive experiments on efficiency and faithfulness tasks demonstrate that RELIEF matches or outperforms behavior-supervised and preference-based baselines while requiring lower training costs. Further analysis validates that shifting a model's reasoning belief effectively shapes its actual behavior.
翻译:大型推理模型(LRMs)在复杂问题求解方面取得了显著成功,但它们常常面临计算冗余或推理不忠实的问题。当前塑造LRM行为的方法通常依赖于强化学习或使用黄金标准推理轨迹进行微调,这种范式不仅计算成本高昂,且难以扩展。本文揭示,LRMs拥有潜在的**推理信念**,这些信念在内部追踪其自身的推理特征,并可通过简单的逻辑值探测捕获。基于这一洞见,我们提出推理信念工程(RELIEF),这是一个简单而有效的框架,通过将模型的自我概念与目标信念蓝图对齐来塑造LRM行为。关键在于,RELIEF完全绕过了对推理轨迹监督的需求。它通过在合成的、自反思的问答对上微调来内化期望特征,这些问答对确认了目标信念。在效率和忠实性任务上的大量实验表明,RELIEF匹配或超越了行为监督和基于偏好的基线方法,同时需要更低的训练成本。进一步的分析验证了,改变模型的推理信念能有效塑造其实际行为。