We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.
翻译:本研究探讨大型语言模型(LLM)在有限计算预算下的推理行为。在此类场景中,快速生成有效的部分解通常比需要高推理成本的穷举式推理更具实际意义。许多现实任务(如行程规划)要求模型在固定的推理预算内提供最优输出。我们提出了一个任意时间推理框架及“任意时间指数”——该指标量化了随着推理标记数增加,解质量提升的有效程度。为进一步提升效率,我们提出一种基于LLM合成偏好数据的推理时自改进方法,使模型通过自身推理结果的比较学习生成更优的中间解。在NaturalPlan(Trip)、AIME和GPQA数据集上的实验表明,该方法在Grok-3、GPT-oss、GPT-4.1/4o及LLaMA模型中均取得稳定增益,在预算约束下同时提升了推理质量与效率。