Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/
翻译:尽管大型推理模型(LRMs)表现出卓越的性能,但其推理行为往往违反直觉,导致推理能力未达最优。为从理论上形式化期望的推理行为,本文提出推理定律(LoRe)这一统一框架,用以刻画LRMs的内在推理模式。我们首先提出计算定律,其核心假设是推理计算量应与问题复杂度呈线性比例关系。除计算维度外,我们通过补充准确率定律进一步扩展LoRe框架。鉴于问题复杂度在实践中难以量化,我们通过定律的两个可验证属性——单调性与组合性——来检验这些假设。为此,我们构建了LoRe-Bench基准测试集,系统化评估大型推理模型在这两个可处理属性上的表现。实验评估表明,多数推理模型展现出合理的单调性,但普遍缺乏组合性。针对此问题,我们开发了一种有效的微调方法,以强化计算定律的组合性约束。大量实证研究表明:更好地遵循计算定律能在多个基准测试中持续提升推理性能,并揭示不同属性与定律之间的协同效应。项目页面:https://lore-project.github.io/