Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing cost while maintaining accuracy. In visual applications, where large-scale image models are widely used, KD enables efficient deployment. However, architectural diversity introduces semantic discrepancies that hinder the use of intermediate representations. Most existing KD methods are designed for homogeneous models and degrade in heterogeneous scenarios, especially when intermediate features are involved. Prior studies mainly focus on the logits space, making limited use of the semantic information in intermediate layers. To address this limitation, Unified Heterogeneous Knowledge Distillation (UHKD) is proposed as a framework that leverages intermediate features in the frequency domain for cross-architecture transfer. Fourier transform is applied to capture global feature information, alleviating representational discrepancies between heterogeneous teacher-student pairs. A Feature Transformation Module (FTM) produces compact frequency-domain representations of teacher features, while a learnable Feature Alignment Module (FAM) projects student features and aligns them via multi-level matching. Training is guided by a joint objective combining mean squared error on intermediate features with Kullback-Leibler divergence on logits. Experiments on CIFAR-100 and ImageNet-1K demonstrate gains of 5.59% and 0.83% over the latest method, highlighting UHKD as an effective approach for unifying heterogeneous representations and enabling efficient utilization of visual knowledge
翻译:知识蒸馏(KD)是一种有效的模型压缩技术,通过将高性能教师模型的知识迁移至轻量级学生模型,在保持精度的同时降低计算成本。在视觉应用中,大规模图像模型被广泛使用,知识蒸馏能够实现高效部署。然而,架构多样性引入了语义差异,阻碍了中间表示的有效利用。现有的大多数知识蒸馏方法针对同构模型设计,在异构场景下性能下降,尤其是在涉及中间特征时。先前的研究主要集中于逻辑空间,对中间层的语义信息利用有限。为克服这一局限,本文提出统一异构知识蒸馏(UHKD)框架,该框架利用频域中的中间特征实现跨架构知识迁移。通过应用傅里叶变换捕捉全局特征信息,缓解异构师生模型对之间的表示差异。特征转换模块(FTM)生成教师特征的紧凑频域表示,而可学习的特征对齐模块(FAM)对学生特征进行投影,并通过多级匹配实现对齐。训练过程由联合目标函数指导,该函数结合了中间特征的均方误差与逻辑输出的Kullback-Leibler散度。在CIFAR-100和ImageNet-1K数据集上的实验表明,UHKD相比现有最优方法分别取得了5.59%和0.83%的性能提升,凸显了其作为统一异构表示并实现视觉知识高效利用的有效方法。