Large language models (LLMs) demonstrate impressive language understanding and contextual learning abilities, making them suitable for natural language processing (NLP) tasks and complex mathematical reasoning. However, when applied to mathematical reasoning tasks, LLMs often struggle to generate correct reasoning steps and answers despite having high probabilities for the solutions. To overcome this limitation and enhance the mathematical reasoning capabilities of fine-tuned LLMs without additional fine-tuning steps, we propose a method that incorporates Monte Carlo Tree Search (MCTS) and a lightweight energy function to rank decision steps and enable immediate reaction and precise reasoning. Specifically, we re-formulate the fine-tuned LLMs into a Residual-based Energy Model (Residual-EBM) and employ noise contrastive estimation to estimate the energy function's parameters. We then utilize MCTS with the energy function as a path verifier to search the output space and evaluate the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, GSM8k and AQUA-RAT, we demonstrate the exceptional capabilities of our method, which significantly improves the pass@1 metric of the fine-tuned model without requiring additional fine-tuning or reinforcement learning with human feedback alignment.
翻译:大语言模型展现出惊人的语言理解与情境学习能力,使其适用于自然语言处理任务及复杂数学推理。然而,在数学推理任务中,尽管模型对解答具有较高置信度,其生成正确推理步骤和答案的能力仍面临挑战。为解决此局限,在不增加额外微调步骤的前提下增强已微调大语言模型的数学推理能力,我们提出了一种融合蒙特卡洛树搜索与轻量级能量函数的方法,通过排序决策步骤实现即时响应与精准推理。具体而言,我们将微调后的大语言模型重新构建为基于残差的能量模型,并采用噪声对比估计优化能量函数参数;进而以能量函数作为路径验证器,利用蒙特卡洛树搜索探索输出空间并评估推理路径。通过在GSM8k与AQUA-RAT两个数学推理基准上的广泛实验,我们证明了该方法无需额外微调或基于人类反馈的强化学习对齐,即可显著提升微调模型的pass@1指标。