Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model. Previous methods for image super-resolution (SR) mostly compare the feature maps directly or after standardizing the dimensions with basic algebraic operations (e.g. average, dot-product). However, the intrinsic semantic differences among feature maps are overlooked, which are caused by the disparate expressive capacity between the networks. This work presents MiPKD, a multi-granularity mixture of prior KD framework, to facilitate efficient SR model through the feature mixture in a unified latent space and stochastic network block mixture. Extensive experiments demonstrate the effectiveness of the proposed MiPKD method.
翻译:知识蒸馏(KD)是一种有前景但具有挑战性的模型压缩技术,它能够将表现优异但结构复杂的教师模型中的丰富学习表征迁移至紧凑的学生模型。先前的图像超分辨率(SR)方法大多直接比较特征图,或通过基本代数运算(如均值、点积)进行维度标准化后的比较。然而,这些方法忽略了由网络间表征能力差异导致的特征图内在语义差异。本文提出MiPKD——一种基于多粒度先验混合的知识蒸馏框架,通过在统一潜在空间中进行特征混合以及随机网络块混合,促进高效SR模型的构建。大量实验验证了所提出的MiPKD方法的有效性。