Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts to bridge this gap, existing methods (e.g., MSE and asymmetric KL divergence) ignore the rich uncertainty profiles required for the final output. In this paper, we introduce DistillLens, a framework that symmetrically aligns the evolving thought processes of student and teacher models. By projecting intermediate hidden states into the vocabulary space via the Logit Lens, we enforce structural alignment using a symmetric divergence objective. Our analysis proves that this constraint imposes a dual-sided penalty, preventing both overconfidence and underconfidence while preserving the high-entropy information conduits essential for final deduction. Extensive experiments on GPT-2 and Llama architectures demonstrate that DistillLens consistently outperforms standard KD and feature-transfer baselines on diverse instruction-following benchmarks. The code is available at https://github.com/manishdhakal/DistillLens.
翻译:标准的知识蒸馏(KD)通过优化最终输出来压缩大型语言模型(LLMs),但通常将教师模型中间层的思维过程视为黑箱。尽管基于特征的蒸馏方法试图弥合这一差距,但现有方法(例如均方误差和非对称KL散度)忽略了最终输出所需的丰富不确定性分布。本文提出DistillLens,一个对称对齐学生与教师模型演化思维过程的框架。通过Logit Lens将中间隐藏状态投影到词汇空间,我们利用对称散度目标实现结构对齐。分析证明,该约束施加了双向惩罚,既能防止过度自信也能避免自信不足,同时保留了最终推理所需的高熵信息通道。在GPT-2和Llama架构上的大量实验表明,DistillLens在多样化的指令跟随基准测试中持续优于标准KD和特征迁移基线。代码发布于https://github.com/manishdhakal/DistillLens。