How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \url{https://github.com/EleutherAI/pythia}.
翻译:大规模语言模型(LLM)在训练过程中如何发展和演化?这些模式随着模型规模扩大如何变化?为回答这些问题,我们提出\textit{Pythia}——一个包含16个LLM的工具包,所有模型均使用完全一致的公开数据顺序进行训练,参数量从7000万到120亿不等。我们公开了每个模型的154个检查点(共16个模型),并提供下载和重建其精确训练数据加载器的工具以支持后续研究。我们期望\textit{Pythia}能推动多个领域的研究,并展示了若干案例研究,包括记忆化、词频对少样本性能的影响以及减少性别偏见等新发现。我们证明,这种高度受控的实验设置能为LLM及其训练动力学带来新见解。训练好的模型、分析代码、训练代码及训练数据可在\url{https://github.com/EleutherAI/pythia}获取。