Test-time compute methods can significantly improve the reasoning capabilities and problem-solving accuracy of large language models (LLMs). However, these approaches require substantially more computational resources, with most compute wasted on exploring low-diversity branches where the model already exhibits high confidence. We observe that a small subset of uncertain reasoning steps has a disproportionately large impact on final prediction accuracy, and branching at these critical junctures tends to yield more diverse and higher-quality candidate reasoning steps. We propose Entropy-Gated Branching (EGB), which branches only at high-uncertainty steps and prunes expansions with a lightweight verifier. On mathematical and financial reasoning benchmarks, EGB improves accuracy by 22.6% over standard inference while operating 31%-75% faster across math benchmarks than test-time beam search with higher performance. Our results show that dynamic resource allocation during inference can substantially improve both efficiency and effectiveness, offering a more scalable pathway to enhanced LLM reasoning capabilities.
翻译:测试时计算方法能够显著提升大语言模型的推理能力和问题求解准确率。然而,这些方法需要消耗大量计算资源,其中大部分计算被浪费在探索模型已表现出高置信度的低多样性分支上。我们观察到,一小部分不确定的推理步骤对最终预测准确率具有不成比例的巨大影响,在这些关键节点进行分支往往能产生更多样化且更高质量的候选推理步骤。我们提出了熵门控分支方法,该方法仅在高不确定性步骤进行分支,并利用轻量级验证器对扩展分支进行剪枝。在数学与金融推理基准测试中,EGB 相比标准推理将准确率提升了 22.6%,同时在数学基准测试中以比测试时集束搜索更高的性能实现了 31%-75% 的加速。我们的结果表明,在推理过程中动态分配计算资源能够同时显著提升效率与效果,为增强大语言模型推理能力提供了一条更具可扩展性的路径。