Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.
翻译:大型语言模型(LLMs),如GPT-4,已在自然语言处理(NLP)任务中展现出卓越性能,包括具有挑战性的数学推理。然而,现有大多数开源模型仅在大规模互联网数据上进行预训练,并未针对数学领域进行优化。本文提出了WizardMath,通过将我们提出的来自进化指令反馈的强化学习(RLEIF)方法应用于数学领域,增强了Llama-2的数学推理能力。通过在两个数学推理基准(GSM8k和MATH)上的大量实验,我们揭示了模型的非凡能力。WizardMath以显著优势超越了所有其他开源LLMs。此外,我们的模型在GSM8k上甚至优于ChatGPT-3.5、Claude Instant-1、PaLM-2和Minerva,同时在MATH上超越了Text-davinci-002、PaLM-1和GPT-3。更多详情及模型权重已在https://github.com/nlpxucan/WizardLM和https://huggingface.co/WizardLM公开。