Large language models are ubiquitous in natural language processing because they can adapt to new tasks without retraining. However, their sheer scale and complexity present unique challenges and opportunities, prompting researchers and practitioners to explore novel model training, optimization, and deployment methods. This literature review focuses on various techniques for reducing resource requirements and compressing large language models, including quantization, pruning, knowledge distillation, and architectural optimizations. The primary objective is to explore each method in-depth and highlight its unique challenges and practical applications. The discussed methods are categorized into a taxonomy that presents an overview of the optimization landscape and helps navigate it to understand the research trajectory better.
翻译:大型语言模型因其无需重新训练即可适应新任务的能力,在自然语言处理领域无处不在。然而,其庞大的规模与复杂性带来了独特的挑战与机遇,促使研究人员和实践者探索新颖的模型训练、优化与部署方法。本文献综述聚焦于降低资源需求与压缩大型语言模型的各种技术,包括量化、剪枝、知识蒸馏与架构优化。主要目标是深入探讨每种方法,并突出其独特的挑战与实际应用。所讨论的方法被归类到一个分类体系中,该体系呈现了优化领域的概貌,有助于更好地导航该领域以理解研究轨迹。