Knowledge Distillation: Enhancing Neural Network Compression with Integrated Gradients

Efficient deployment of deep neural networks on resource-constrained devices demands advanced compression techniques that preserve accuracy and interoperability. This paper proposes a machine learning framework that augments Knowledge Distillation (KD) with Integrated Gradients (IG), an attribution method, to optimise the compression of convolutional neural networks. We introduce a novel data augmentation strategy where IG maps, precomputed from a teacher model, are overlaid onto training images to guide a compact student model toward critical feature representations. This approach leverages the teacher's decision-making insights, enhancing the student's ability to replicate complex patterns with reduced parameters. Experiments on CIFAR-10 demonstrate the efficacy of our method: a student model, compressed 4.1-fold from the MobileNet-V2 teacher, achieves 92.5% classification accuracy, surpassing the baseline student's 91.4% and traditional KD approaches, while reducing inference latency from 140 ms to 13 ms--a tenfold speedup. We perform hyperparameter optimisation for efficient learning. Comprehensive ablation studies dissect the contributions of KD and IG, revealing synergistic effects that boost both performance and model explainability. Our method's emphasis on feature-level guidance via IG distinguishes it from conventional KD, offering a data-driven solution for mining transferable knowledge in neural architectures. This work contributes to machine learning by providing a scalable, interpretable compression technique, ideal for edge computing applications where efficiency and transparency are paramount.

翻译：在资源受限设备上高效部署深度神经网络需要先进的压缩技术，以保持准确性和互操作性。本文提出一种机器学习框架，通过将知识蒸馏（KD）与归因方法积分梯度（IG）相结合，优化卷积神经网络的压缩过程。我们引入一种新颖的数据增强策略：将预先从教师模型计算得到的IG映射叠加至训练图像上，以引导紧凑型学生模型学习关键特征表示。该方法利用教师模型的决策洞察，增强学生模型以更少参数复现复杂模式的能力。在CIFAR-10数据集上的实验证明了本方法的有效性：从MobileNet-V2教师模型压缩4.1倍的学生模型实现了92.5%的分类准确率，超越基线学生模型的91.4%及传统KD方法，同时将推理延迟从140毫秒降低至13毫秒——实现十倍加速。我们通过超参数优化实现高效学习。全面的消融实验解析了KD与IG的贡献，揭示出能同时提升性能和模型可解释性的协同效应。本方法通过IG实现特征级指导的核心理念，使其区别于传统KD方法，为挖掘神经架构中可迁移知识提供了数据驱动的解决方案。这项工作通过提供可扩展、可解释的压缩技术，为机器学习领域作出贡献，特别适用于对效率和透明度要求极高的边缘计算应用场景。