Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LEMA incorporates mistake-correction data pairs during fine-tuning LLMs. Specifically, we first collect inaccurate reasoning paths from various LLMs, and then employ GPT-4 as a ''corrector'' to identify the mistake step, explain the reason for the mistake, correct the mistake and generate the final answer. In addition, we apply a correction-centric evolution strategy that effectively expands the question set for generating correction data. Experiments across various LLMs and reasoning tasks show that LEMA effectively improves CoT-alone fine-tuning. Our further ablations shed light on the non-homogeneous effectiveness between CoT data and correction data. These results suggest a significant potential for LLMs to improve through learning from their mistakes. Our code, models and prompts are publicly available at https://github.com/microsoft/LEMA.
翻译:大语言模型最近在解决数学问题上展现出了卓越的推理能力。为了进一步提升它们的推理能力,本研究探索了大语言模型是否能像人类学习过程一样从错误中学习。考虑一名未能解决数学问题的人类学生,他会在犯错后学习犯错误的原因及纠正方法。模仿这种错误驱动的学习过程,LEMA在微调大语言模型时融入了纠错数据对。具体而言,我们首先从各种大语言模型中收集不准确的推理路径,然后采用GPT-4作为“纠错器”来识别错误步骤、解释错误原因、纠正错误并生成最终答案。此外,我们应用了一种以纠错为中心的进化策略,有效扩展了用于生成纠错数据的问题集。跨多种大语言模型和推理任务的实验表明,LEMA有效提升了仅使用思维链微调的效果。进一步的消融实验揭示了思维链数据与纠错数据之间的非均匀有效性。这些结果表明,大语言模型通过从错误中学习具有显著提升潜力。我们的代码、模型和提示语已在 https://github.com/microsoft/LEMA 公开提供。