Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.
翻译:大型语言模型(LLMs)近期在解决数学问题方面展现了卓越的推理能力。为了进一步提升这一能力,本文提出了一种模仿人类学习过程的“从错误中学习”(LeMa)方法。试想一位未能解决数学题的学生会从自身错误及纠错过程中汲取经验。仿照这种错误驱动的学习机制,LeMa在由GPT-4生成的错误修正数据对基础上对LLMs进行微调。具体而言,我们首先收集各类LLMs产生的不准确推理路径,随后将GPT-4作为“修正器”,用于:(1) 定位错误步骤;(2) 解释错误原因;(3) 修正错误并生成最终答案。实验结果表明了LeMa的有效性:在五种基座LLM与两项数学推理任务中,与仅基于思维链数据微调相比,LeMa均能持续提升性能。值得关注的是,LeMa同样能够优化WizardMath、MetaMath等专用LLM,在GSM8K上达到85.4%的pass@1准确率,在MATH上达到27.1%。这一结果超越了非执行型开源模型在这些挑战性任务上的最优性能。我们的代码、数据与模型将开源至https://github.com/microsoft/CodeT。