While Large Language Models (LLMs) are effectively aligned through extensive pre-training and fine-tuning, they still struggle with varying levels of uncertainty during token generation. In our investigation of mathematical reasoning, we observe that errors are more likely to arise at tokens exhibiting high entropy and variance of entropy in the model's output distribution. Based on the observation, we propose a novel approach that dynamically branches the generation process on demand instead of defaulting to the single most probable token. By exploring in parallel multiple branches stemming from high probability tokens of critical decision points, the model can discover diverse reasoning paths that might otherwise be missed. We further harness external feedback from larger models to rank and select the most coherent and accurate reasoning branch. Our experimental results on mathematical word problems and calculation questions show that this branching strategy boosts the reasoning capabilities of small LLMs up to 4.6% compared to conventional argmax decoding.
翻译:尽管大型语言模型通过大规模预训练与精调实现了有效对齐,其在生成过程中仍面临不同程度的不确定性。在数学推理任务的研究中,我们观察到错误更易出现在模型输出分布中具有高熵值及熵方差的词元上。基于这一发现,我们提出一种新颖方法,在需要时动态生成分支,而非默认选择单一最高概率词元。通过并行探索关键决策点高概率词元衍生的多个分支,模型能够发现原本可能被忽略的多样化推理路径。我们进一步利用更大模型的外部反馈对分支进行排序,选择逻辑最连贯且结果最准确的推理路径。在数学应用题与计算题上的实验结果表明,相比传统的argmax解码策略,该分支生成方法可将小型大型语言模型的推理能力提升最高达4.6%。