Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings. Existing tree-search policies are largely budget-agnostic, treating the budget as a termination condition, which can lead to late-stage over-branching or premature termination. We propose {Budget-Guided MCTS} (BG-MCTS), a tree-search decoding algorithm that aligns its search policy with the remaining token budget: it starts with broad exploration, then prioritizes refinement and answer completion as the budget depletes while reducing late-stage branching from shallow nodes. BG-MCTS consistently outperforms budget-agnostic tree-search baselines across different budgets on MATH500 and AIME24/25 with open-weight LLMs.
翻译:树搜索解码是大语言模型(LLMs)一种有效的测试时扩展方法,但实际部署场景中每个查询都受到固定令牌预算的限制,且该预算因应用场景而异。现有树搜索策略大多忽略预算约束,仅将其视为终止条件,这可能导致后期过度分支或过早终止。我们提出{Budget-Guided MCTS}(BG-MCTS)——一种使其搜索策略与剩余令牌预算对齐的树搜索解码算法:该算法以广泛探索开始,随着预算消耗逐步优先进行结果优化与答案完善,同时减少浅层节点的后期分支。在MATH500和AIME24/25数据集上,使用开源权重LLMs的实验表明,BG-MCTS在不同预算条件下均持续优于忽略预算的树搜索基线方法。