We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.
翻译:我们提出phi-1,一种用于代码生成的新型大型语言模型,其规模远小于竞品模型:phi-1是一种基于Transformer的模型,拥有13亿参数,在8块A100 GPU上训练了4天,使用从网络中选取的“教科书质量”数据(60亿tokens)以及通过GPT-3.5合成生成的教科书和练习(10亿tokens)。尽管规模较小,phi-1在HumanEval上达到了50.6%的pass@1准确率,在MBPP上达到了55.5%。与我们在微调阶段之前基于编程练习数据集训练的phi-1-base模型,以及采用与phi-1相同训练流程但参数为3.5亿的较小模型phi-1-small(该模型在HumanEval上仍能达到45%的准确率)相比,phi-1还展现出令人惊讶的涌现特性。