A Comprehensive Overview of Large Language Models

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations of the underlying neural networks, context length improvements, model alignment, training datasets, benchmarking, efficiency and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides that overview to the research community. It not only focuses on a systematic treatment of the existing literature on a broad range of LLM related concept, but also pays special attention to providing comprehensive summaries with extensive details about the individual existing models, datasets and major insights. We also pay heed to aligning our overview with the emerging outlook of this research direction by accounting for the other recently materializing reviews of the broader research direction of LLMs. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of this research direction. This review article is intended to not only provide a systematic survey, but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research direction.

翻译：大语言模型（LLMs）近期在自然语言处理任务及其他领域展现出卓越能力。LLMs的成功引发了该方向上大量研究贡献的涌入。这些工作涵盖多样化主题，包括底层神经网络的架构创新、上下文长度改进、模型对齐、训练数据集、基准测试、效率优化等方面。随着技术的快速发展和LLM研究的频繁突破，把握该方向进展的整体图景变得极具挑战性。考虑到LLM相关文献的激增，研究社区亟需一份简明而全面的近期发展综述。本文旨在为研究社区提供这一综述。本文不仅对现有LLM相关概念的广泛文献进行系统化处理，还特别注重对现有模型、数据集及主要见解提供包含详细信息的综合性总结。同时，通过考量近期涌现的其他LLM研究方向的综述，我们努力使本综述与该研究方向的未来趋势保持一致。本综述内容自洽，既讨论相关背景概念，也涵盖该研究方向的最新前沿主题。本文旨在成为系统调查和便捷综合参考的资源，帮助研究人员和实践者从现有工作的丰富信息总结中汲取洞见，推动LLM研究方向的发展。