Large Language Models (LLMs) have revolutionized natural language processing tasks with remarkable success. However, their formidable size and computational demands present significant challenges for practical deployment, especially in resource-constrained environments. As these challenges become increasingly pertinent, the field of model compression has emerged as a pivotal research area to alleviate these limitations. This paper presents a comprehensive survey that navigates the landscape of model compression techniques tailored specifically for LLMs. Addressing the imperative need for efficient deployment, we delve into various methodologies, encompassing quantization, pruning, knowledge distillation, and more. Within each of these techniques, we highlight recent advancements and innovative approaches that contribute to the evolving landscape of LLM research. Furthermore, we explore benchmarking strategies and evaluation metrics that are essential for assessing the effectiveness of compressed LLMs. By providing insights into the latest developments and practical implications, this survey serves as an invaluable resource for both researchers and practitioners. As LLMs continue to evolve, this survey aims to facilitate enhanced efficiency and real-world applicability, establishing a foundation for future advancements in the field.
翻译:大语言模型(LLM)已在自然语言处理任务中取得显著成功并引发革命性变革。然而,其庞大的规模和计算需求对实际部署构成重大挑战,尤其是在资源受限的环境中。随着这些挑战日益凸显,模型压缩领域已成为缓解上述限制的关键研究方向。本文针对LLM专门适用的模型压缩技术进行全景式综述。为满足高效部署的迫切需求,我们系统剖析了多种方法论,涵盖量化、剪枝、知识蒸馏等技术范式。在对各类技术的深入探讨中,我们重点梳理了最新进展与创新方法,这些成果正持续推动LLM研究领域的发展迭代。此外,我们系统考察了评估压缩LLM效能所必需的基准测试策略与评价指标体系。通过揭示最新进展及其应用价值,本综述为研究人员与工程实践者提供了宝贵参考。随着LLM技术的持续演进,本综述旨在促进模型效率提升与实际应用落地,为该领域的未来发展奠定基础。