In this paper we introduce InDistill, a model compression approach that combines knowledge distillation and channel pruning in a unified framework for the transfer of the critical information flow paths from a heavyweight teacher to a lightweight student. Such information is typically collapsed in previous methods due to an encoding stage prior to distillation. By contrast, InDistill leverages a pruning operation applied to the teacher's intermediate layers reducing their width to the corresponding student layers' width. In that way, we force architectural alignment enabling the intermediate layers to be directly distilled without the need of an encoding stage. Additionally, a curriculum learning-based training scheme is adopted considering the distillation difficulty of each layer and the critical learning periods in which the information flow paths are created. The proposed method surpasses state-of-the-art performance on three standard benchmarks, i.e. CIFAR-10, CUB-200, and FashionMNIST by 3.08%, 14.27%, and 1% mAP, respectively, as well as on more challenging evaluation settings, i.e. ImageNet and CIFAR-100 by 1.97% and 5.65% mAP, respectively.
翻译:本文提出InDistill模型压缩方法,将知识蒸馏与通道剪枝统一框架内实现关键信息流路径从重型教师模型到轻量级学生模型的迁移。由于蒸馏前编码阶段的存在,此类信息在先前方法中通常被压缩。相比之下,InDistill通过对教师模型中间层施加剪枝操作,将其宽度缩减至对应学生层的宽度,从而强制实现架构对齐,使得中间层无需编码阶段即可直接蒸馏。此外,方法采用基于课程学习的训练方案,综合考虑各层蒸馏难度及信息流路径形成的关键学习周期。该方法在三个标准基准测试(CIFAR-10、CUB-200和FashionMNIST)上分别以3.08%、14.27%和1%的mAP超越当前最优性能,同时在更具挑战性的评估设置(ImageNet和CIFAR-100)中分别以1.97%和5.65%的mAP取得优势。