A Survey on Knowledge Distillation of Large Language Models

This survey presents an in-depth exploration of knowledge distillation (KD) techniques within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in transferring sophisticated capabilities from proprietary giants such as GPT-4 to accessible, open-source models like LLaMA and Mistral. Amidst the evolving AI landscape, this work elucidates the critical disparities between proprietary and open-source LLMs, demonstrating how KD serves as an essential conduit for imbuing the latter with the former's advanced functionalities and nuanced understandings. Our survey is meticulously structured around three foundational pillars: algorithm, skill, and verticalization -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields. Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts. This work aims to provide an insightful guide for researchers and practitioners, offering a detailed overview of current methodologies in knowledge distillation and proposing future research directions. By bridging the gap between proprietary and open-source LLMs, this survey underscores the potential for more accessible, efficient, and sustainable AI solutions, fostering a more inclusive and equitable landscape in AI advancements. An associated Github repository is available at https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs.

翻译：本综述深入探讨了大规模语言模型（LLM）领域的知识蒸馏（KD）技术，重点阐述了KD在将GPT-4等专有巨头的复杂能力迁移至LLaMA和Mistral等可访问开源模型中的关键作用。在人工智能不断发展演进的背景下，本文阐明了专有LLM与开源LLM之间的关键差异，展示了KD如何作为重要纽带，使后者能够继承前者的高级功能与细微理解能力。本综述围绕三个基础支柱精心组织：算法、技能与垂直化——对KD机制、特定认知能力的增强及其在多个领域的实际应用进行了全面考察。尤为关键的是，综述剖析了数据增强（DA）与KD之间错综复杂的相互作用，揭示了DA如何作为KD框架内一种强有力的范式来提升LLM的性能。通过利用DA生成上下文丰富、针对特定技能的训练数据，KD超越了传统界限，使开源模型能够接近其专有对手所特有的语境适应能力、伦理对齐能力及深层语义洞察力。本文旨在为研究人员和从业者提供一份富有洞见的指南，系统概述当前知识蒸馏方法论，并指出未来研究方向。通过弥合专有LLM与开源LLM之间的鸿沟，本综述强调了构建更易获得、更高效、更可持续的人工智能解决方案的潜力，从而推动人工智能发展迈向更具包容性和公平性的格局。相关GitHub仓库见https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs。