Large reasoning language models are typically run with fixed inference budgets, which can waste computation or terminate reasoning prematurely. We introduce Certainty-Guided Reasoning (CGR), a model-agnostic adaptive inference procedure that periodically probes whether the current reasoning supports a confident final answer and terminates early once a target certainty threshold is reached, otherwise continuing until the end-of-thinking token or the budget limit. Certainty is estimated from the model's predicted probabilities over the answer tokens, yielding a lightweight stopping criterion. On AIME2025, CGR preserves baseline accuracy while reducing token usage, providing a tunable certainty-efficiency trade-off that can eliminate millions of tokens in aggregate. Across 64 random seeds, CGR exhibits consistent behavior. We also introduce a Grade metric that penalizes incorrect answers and permits abstention, capturing risk-sensitive performance. Results show that CGR improves Grade by abstaining when certainty remains low.
翻译:大型推理语言模型通常以固定的推理预算运行,这可能导致计算资源浪费或推理过早终止。我们提出了确定性引导推理(CGR),这是一种与模型无关的自适应推理过程,定期探测当前推理是否支持一个确信的最终答案,并在达到目标确定性阈值时提前终止,否则继续推理直至思维结束标记或预算上限。确定性通过模型对答案标记的预测概率进行估计,从而形成一个轻量级的停止准则。在AIME2025上,CGR在保持基线准确率的同时减少了标记使用量,提供了一个可调节的确定性-效率权衡,能够在总体上消除数百万个标记。在64个随机种子中,CGR表现出了一致的行为。我们还引入了一个Grade指标,该指标惩罚错误答案并允许弃答,以捕捉风险敏感性能。结果表明,当确定性持续较低时,CGR通过弃答提高了Grade得分。