Recent advancements in large language models (LLMs) have substantially enhanced their mathematical reasoning abilities. However, these models still struggle with complex problems that require multiple reasoning steps, frequently leading to logical or numerical errors. While numerical mistakes can largely be addressed by integrating a code interpreter, identifying logical errors within intermediate steps is more challenging. Moreover, manually annotating these steps for training is not only expensive but also demands specialized expertise. In this study, we introduce an innovative approach that eliminates the need for manual annotation by leveraging the Monte Carlo Tree Search (MCTS) framework to generate both the process supervision and evaluation signals automatically. Essentially, when a LLM is well pre-trained, only the mathematical questions and their final answers are required to generate our training data, without requiring the solutions. We proceed to train a step-level value model designed to improve the LLM's inference process in mathematical domains. Our experiments indicate that using automatically generated solutions by LLMs enhanced with MCTS significantly improves the model's proficiency in dealing with intricate mathematical reasoning tasks.
翻译:近期大语言模型(LLMs)的进步显著提升了其数学推理能力。然而,这些模型在处理需要多步推理的复杂问题时仍存在困难,常出现逻辑或数值错误。尽管数值错误可通过集成代码解释器基本解决,但识别中间步骤的逻辑错误更具挑战性。此外,人工标注此类步骤不仅成本高昂,还需专业知识。本研究提出一种创新方法,通过蒙特卡洛树搜索(MCTS)框架自动生成过程监督信号和评估信号,彻底摒弃人工标注需求。本质上,当大语言模型完成充分预训练后,仅需数学问题及最终答案即可生成训练数据,无需解题过程。我们进而训练一个步骤级价值模型,旨在改进大语言模型在数学领域的推理过程。实验表明,采用MCTS增强的自动生成解题方案能显著提升模型处理复杂数学推理任务的能力。