Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.
翻译:大型语言模型(LLMs)已展现出卓越的泛化能力,推动了大量模型的开发。这些模型通过提出各类新型架构、优化现有架构的训练策略、扩展上下文长度、使用高质量训练数据以及延长训练时长来超越基线水平。系统分析这些新进展对于识别能够增强LLMs训练稳定性并提升泛化能力的关键变革至关重要。本综述论文全面分析了LLMs的架构及其分类、训练策略、训练数据集和性能评估,并探讨了未来研究方向。此外,本文还阐述了LLMs的基本构建模块与核心概念,随后完整概述了LLMs的重要特征与功能。最后,论文总结了LLM研究的重要发现,并整合了开发先进LLMs所需的关键架构与训练策略。鉴于LLMs领域的持续进展,我们计划通过增加新章节、收录最新LLM模型的方式定期更新本文。