Knowledge Distillation (KD) for Large Language Models (LLMs) has become increasingly important as models grow in size and complexity. While existing distillation approaches focus on imitating teacher behavior, they often overlook the original learning environment that shaped the teacher's knowledge. Inspired by the experiential learning theory and inverse reinforcement learning, we propose Experiential Knowledge Distillation ($\mathcal{X}$-KD), a novel and general framework that enables student models to learn in the teacher's original learning environment. $\mathcal{X}$-KD adopts the Approximated Variational Reward Imitation Learning (AVRIL) framework to jointly model the teacher's original reward function and perform policy distillation, encouraging consistency between the student policy and the original reward function. Our derivation demonstrates that $\mathcal{X}$-KD follows the supervised learning framework and applies to both sequence-level and divergence-based distillation methods, underlining the simplicity and flexibility of our approach. Empirical results show that $\mathcal{X}$-KD outperforms the generalized KD and MiniLLM baselines on abstractive summarization, machine translation, and arithmetic reasoning tasks. Additionally, $\mathcal{X}$-KD achieves better performance-diversity trade-off and data efficiency than baseline KD approaches.
翻译:随着大语言模型(LLMs)规模和复杂度的增长,面向大语言模型的知识蒸馏(KD)变得日益重要。现有的蒸馏方法主要侧重于模仿教师模型的行为,却常常忽视了塑造教师模型知识的原始学习环境。受经验学习理论和逆强化学习的启发,我们提出经验知识蒸馏($\mathcal{X}$-KD),这是一个新颖且通用的框架,使学生模型能够在教师模型的原始学习环境中进行学习。$\mathcal{X}$-KD采用近似变分奖励模仿学习(AVRIL)框架,联合建模教师模型的原始奖励函数并执行策略蒸馏,从而鼓励学生策略与原始奖励函数之间的一致性。我们的推导证明,$\mathcal{X}$-KD遵循监督学习框架,并适用于序列级和基于散度的蒸馏方法,这凸显了我们方法的简洁性和灵活性。实证结果表明,在抽象摘要、机器翻译和算术推理任务上,$\mathcal{X}$-KD的性能优于广义知识蒸馏和MiniLLM基线方法。此外,与基线知识蒸馏方法相比,$\mathcal{X}$-KD实现了更好的性能-多样性权衡和数据效率。