Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.
翻译:大语言模型(LLMs)在语言理解和生成方面展现了卓越的能力,并促进了广泛的应用场景。然而,针对如何以最小试错成本和计算资源高效地将超过500亿参数的大语言模型进行规模化扩展,目前仍缺乏详细的、开源的方法论。本报告介绍了Tele-FLM(又称FLM-2),一个520亿参数的开源多语言大语言模型,其特点是具有稳定高效的预训练范式和增强的事实判断能力。Tele-FLM在文本语料上的BPB指标显示出卓越的多语言建模能力。此外,在英语和汉语基础模型评估中,其性能可与涉及更大预训练计算量的强大开源模型(如Llama2-70B和DeepSeek-67B)相媲美。除模型权重外,我们还分享了核心设计、工程实践和训练细节,期望这些内容能对学术界和工业界有所裨益。