FiGKD: Fine-Grained Knowledge Distillation via High-Frequency Detail Transfer

Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from a high-capacity teacher model to a smaller student model by aligning their output distributions. However, existing methods often underperform in fine-grained visual recognition tasks, where distinguishing subtle differences between visually similar classes is essential. This performance gap stems from the fact that conventional approaches treat the teacher's output logits as a single, undifferentiated signal-assuming all contained information is equally beneficial to the student. Consequently, student models may become overloaded with redundant signals and fail to capture the teacher's nuanced decision boundaries. To address this issue, we propose Fine-Grained Knowledge Distillation (FiGKD), a novel frequency-aware framework that decomposes a model's logits into low-frequency (content) and high-frequency (detail) components using the discrete wavelet transform (DWT). FiGKD selectively transfers only the high-frequency components, which encode the teacher's semantic decision patterns, while discarding redundant low-frequency content already conveyed through ground-truth supervision. Our approach is simple, architecture-agnostic, and requires no access to intermediate feature maps. Extensive experiments on CIFAR-100, TinyImageNet, and multiple fine-grained recognition benchmarks show that FiGKD consistently outperforms state-of-the-art logit-based and feature-based distillation methods across a variety of teacher-student configurations. These findings confirm that frequency-aware logit decomposition enables more efficient and effective knowledge transfer, particularly in resource-constrained settings.

翻译：知识蒸馏（KD）是一种广泛采纳的技术，通过使教师模型（高容量）与学生模型（小型）的输出分布对齐，实现知识转移。然而，现有方法在细粒度视觉识别任务中往往表现欠佳，这类任务要求区分视觉相似类别间的细微差异。这一性能差距源于传统方法将教师模型的输出logits视为单一、无差别的信号——假设其中所有信息对学生模型同等有益。这导致学生模型可能被冗余信号过载，无法捕捉教师模型的精细决策边界。为解决该问题，我们提出细粒度知识蒸馏（FiGKD），一种新颖的频域感知框架，利用离散小波变换（DWT）将模型logits分解为低频（内容）和高频（细节）分量。FiGKD仅选择性传递编码教师语义决策模式的高频分量，同时丢弃已通过真实标注监督传递的冗余低频内容。本方法简单、与架构无关，且无需访问中间特征图。在CIFAR-100、TinyImageNet及多个细粒度识别基准上的大量实验表明，FiGKD在各种教师-学生配置下均持续优于现有最先进的基于logits和基于特征的蒸馏方法。这些结果证实，频域感知的logits分解能够实现更高效、更有效的知识迁移，尤其在资源受限场景中。