Knowledge distillation from Large Language Models (LLMs) to smaller models has emerged as a critical technique for deploying efficient AI systems. However, current methods for distillation via synthetic data lack pedagogical awareness, treating knowledge transfer as a one-off data synthesis and training task rather than a systematic learning process. In this paper, we propose a novel pedagogically-inspired framework for LLM knowledge distillation that draws from fundamental educational principles. Our approach introduces a three-stage pipeline -- Knowledge Identifier, Organizer, and Adapter (IOA) -- that systematically identifies knowledge deficiencies in student models, organizes knowledge delivery through progressive curricula, and adapts representations to match the cognitive capacity of student models. We integrate Bloom's Mastery Learning Principles and Vygotsky's Zone of Proximal Development to create a dynamic distillation process where student models approach teacher model's performance on prerequisite knowledge before advancing, and new knowledge is introduced with controlled, gradual difficulty increments. Extensive experiments using LLaMA-3.1/3.2 and Qwen2.5 as student models demonstrate that IOA achieves significant improvements over baseline distillation methods, with student models retaining 94.7% of teacher performance on DollyEval while using less than 1/10th of the parameters. Our framework particularly excels in complex reasoning tasks, showing 19.2% improvement on MATH and 22.3% on HumanEval compared with state-of-the-art baselines.
翻译:从大型语言模型向小型模型进行知识蒸馏已成为部署高效人工智能系统的关键技术。然而,当前基于合成数据的蒸馏方法缺乏教学意识,将知识传递视为一次性数据合成与训练任务,而非系统化的学习过程。本文提出一种受教学原理启发的新型LLM知识蒸馏框架。我们的方法引入三阶段流程——知识识别器、组织器与适配器——系统识别学生模型的知识缺陷,通过渐进式课程组织知识传递,并调整表征以匹配学生模型的认知能力。我们整合布鲁姆的掌握学习原理与维果茨基的最远发展区理论,创建动态蒸馏过程:学生模型在进阶前需达到教师模型在预备知识上的表现水平,新知识的引入则通过受控的渐进难度递增实现。以LLaMA-3.1/3.2和Qwen2.5作为学生模型的广泛实验表明,IOA框架相比基线蒸馏方法取得显著提升:在DollyEval基准上,学生模型以不足十分之一的参数量保留了教师模型94.7%的性能。该框架在复杂推理任务中表现尤为突出,在MATH和HumanEval基准上分别较最先进基线提升19.2%和22.3%。