The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}.
翻译:大型语言模型(LLMs)的快速发展已成为推动各领域变革、重塑通用人工智能格局的关键驱动力。然而,这些模型日益增长的计算和内存需求带来了巨大挑战,阻碍了学术研究和实际应用。为解决这些问题,研究人员开发了包括算法和硬件解决方案在内的多种方法来提升LLMs的效率。本综述全面回顾了旨在提高LLM效率的算法进展。与通常聚焦于训练或模型压缩等特定领域的其他综述不同,本文探讨了端到端LLM算法开发所必需的多维度效率问题。具体而言,涵盖了与效率相关的广泛主题,包括缩放定律、数据利用、架构创新、训练与调优策略以及推理技术。本文旨在为研究人员和实践者提供宝贵资源,为该关键研究领域的未来创新奠定基础。相关参考文献库维护于url{https://github.com/tding1/Efficient-LLM-Survey}。