The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatile math reasoner, verifier, prover, and augmenter. These abilities can be used to develop the next math LLMs or self-iteration. InternLM-Math obtains open-sourced state-of-the-art performance under the setting of in-context learning, supervised fine-tuning, and code-assisted reasoning in various informal and formal benchmarks including GSM8K, MATH, Hungary math exam, MathBench-ZH, and MiniF2F. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning. We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math. Our models, codes, and data are released at \url{https://github.com/InternLM/InternLM-Math}.
翻译:大语言模型的数学能力可体现其抽象推理能力。本文介绍并开源了我们的数学推理大语言模型InternLM-Math,该模型基于InternLM2进行持续预训练。我们将思维链推理、奖励建模、形式推理、数据增强与代码解释器统一于序列到序列格式中,通过监督训练使模型成为通用的数学推理器、验证器、证明器与增强器。这些能力可用于开发下一代数学大语言模型或实现自我迭代。在上下文学习、监督微调及代码辅助推理的设置下,InternLM-Math在GSM8K、MATH、匈牙利数学考试、MathBench-ZH和MiniF2F等多种非形式与形式基准测试中均取得了开源模型中的最优性能。我们的预训练模型在未经微调的情况下,于MiniF2F测试集上达到30.3分。我们进一步探索了如何利用LEAN解决数学问题,并研究了其在多任务学习设置下的性能,这展现了使用LEAN作为数学求解与证明统一平台的可能性。我们的模型、代码与数据已发布于\url{https://github.com/InternLM/InternLM-Math}。