当推理遇见其定律 (When Reasoning Meets Its Laws)

Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/

翻译：尽管大型推理模型（LRMs）表现出卓越的性能，但其推理行为往往违反直觉，导致推理能力未达最优。为从理论上形式化期望的推理行为，本文提出推理定律（LoRe）这一统一框架，用以刻画LRMs的内在推理模式。我们首先提出计算定律，其核心假设是推理计算量应与问题复杂度呈线性比例关系。除计算维度外，我们通过补充准确率定律进一步扩展LoRe框架。鉴于问题复杂度在实践中难以量化，我们通过定律的两个可验证属性——单调性与组合性——来检验这些假设。为此，我们构建了LoRe-Bench基准测试集，系统化评估大型推理模型在这两个可处理属性上的表现。实验评估表明，多数推理模型展现出合理的单调性，但普遍缺乏组合性。针对此问题，我们开发了一种有效的微调方法，以强化计算定律的组合性约束。大量实证研究表明：更好地遵循计算定律能在多个基准测试中持续提升推理性能，并揭示不同属性与定律之间的协同效应。项目页面：https://lore-project.github.io/

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

DeepSeek模型综述：V1 V2 V3 R1-Zero

专知会员服务

116+阅读 · 2025年2月11日

【ACL 2021 】ExCAR: 事理图谱增强的可解释因果推理

专知会员服务

49+阅读 · 2021年11月10日

【SIGIR2020-中科院计算所】L2R2: 利用排名进行外展推理，L2R2: Leveraging Ranking for Abductive Reasoning

专知会员服务

11+阅读 · 2020年5月25日