We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
翻译:我们提出了Llemma,一个面向数学领域的大型语言模型。我们在Proof-Pile-2(一个混合了科学论文、包含数学内容的网络数据及数学代码的数据集)上对Code Llama进行继续预训练,从而得到了Llemma。在MATH基准测试中,Llemma在同等参数规模下优于所有已知的开放基础模型,以及未公开的Minerva模型系列。此外,Llemma无需任何进一步微调即可实现工具使用和形式定理证明。我们公开释放所有成果,包括70亿和340亿参数模型、Proof-Pile-2数据集以及用于复现实验的代码。