Recently, large reasoning models demonstrate exceptional performance on various tasks. However, reasoning models always consume excessive tokens even for simple queries, leading to resource waste and prolonged user latency. To address this challenge, we propose SelfBudgeter - a self-adaptive reasoning strategy for efficient and controllable reasoning. Specifically, we first train the model to self-estimate the required reasoning budget based on the query. We then introduce budget-guided GPRO for reinforcement learning, which effectively maintains accuracy while reducing output length. Experimental results demonstrate that SelfBudgeter dynamically allocates budgets according to problem complexity, achieving an average response length compression of 61% on math reasoning tasks while maintaining accuracy. Furthermore, SelfBudgeter allows users to see how long generation will take and decide whether to continue or stop. Additionally, users can directly control the reasoning length by setting token budgets upfront.
翻译:近年来,大型推理模型在各种任务上展现出卓越性能。然而,即使针对简单查询,推理模型也往往消耗过多令牌,导致资源浪费和用户延迟增加。为应对这一挑战,我们提出SelfBudgeter——一种高效且可控的自适应推理策略。具体而言,我们首先训练模型使其能够根据查询内容自主预估所需的推理预算。随后,我们引入预算引导的GPRO方法进行强化学习,在有效保持准确率的同时缩减输出长度。实验结果表明,SelfBudgeter能根据问题复杂度动态分配预算,在数学推理任务上实现平均响应长度压缩61%的同时保持准确率。此外,SelfBudgeter允许用户预知生成过程所需时长并决定是否继续执行。用户还可通过预先设定令牌预算直接控制推理长度。