We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.
翻译:我们提出了TinyLlama,一个参数为1.1B的紧凑型语言模型,在约1万亿个token上进行了约3个周期的预训练。基于Llama 2的架构和分词器,TinyLlama利用了开源社区贡献的多项先进技术(例如FlashAttention),实现了更高的计算效率。尽管其规模相对较小,TinyLlama在一系列下游任务中展现了卓越的性能,显著优于现有同等规模的开源语言模型。我们的模型检查点和代码已在GitHub上公开,地址为https://github.com/jzhang38/TinyLlama。