In visual tasks, large teacher models capture essential features and deep information, enhancing performance. However, distilling this information into smaller student models often leads to performance loss due to structural differences and capacity limitations. To tackle this, we propose a distillation framework based on graph knowledge, including a multi-level feature alignment strategy and an attention-guided mechanism to provide a targeted learning trajectory for the student model. We emphasize spectral embedding (SE) as a key technique in our distillation process, which merges the student's feature space with the relational knowledge and structural complexities similar to the teacher network. This method captures the teacher's understanding in a graph-based representation, enabling the student model to more accurately mimic the complex structural dependencies present in the teacher model. Compared to methods that focus only on specific distillation areas, our strategy not only considers key features within the teacher model but also endeavors to capture the relationships and interactions among feature sets, encoding these complex pieces of information into a graph structure to understand and utilize the dynamic relationships among these pieces of information from a global perspective. Experiments show that our method outperforms previous feature distillation methods on the CIFAR-100, MS-COCO, and Pascal VOC datasets, proving its efficiency and applicability.
翻译:在视觉任务中,大型教师模型能够捕获关键特征与深层信息,从而提升性能。然而,由于结构差异与容量限制,将此类信息蒸馏至更小的学生模型常导致性能损失。为解决这一问题,我们提出一种基于图知识的蒸馏框架,包含多级特征对齐策略与注意力引导机制,为学生模型提供有针对性的学习轨迹。我们将谱嵌入作为蒸馏过程中的关键技术,通过融合学生模型的特征空间与教师网络类似的关联知识及结构复杂性。该方法以图表示形式捕获教师模型的理解能力,使学生模型更精确地模仿教师模型中存在的复杂结构依赖关系。相较于仅关注特定蒸馏区域的方法,我们的策略不仅考虑教师模型中的关键特征,还致力于捕获特征集之间的关系与相互作用,将这些复杂信息编码为图结构,从而从全局视角理解并利用这些信息间的动态关联。实验表明,本方法在CIFAR-100、MS-COCO与Pascal VOC数据集上均优于现有特征蒸馏方法,验证了其高效性与适用性。