Transformers have emerged as the superior choice for face recognition tasks, but their insufficient platform acceleration hinders their application on mobile devices. In contrast, Convolutional Neural Networks (CNNs) capitalize on hardware-compatible acceleration libraries. Consequently, it has become indispensable to preserve the distillation efficacy when transferring knowledge from a Transformer-based teacher model to a CNN-based student model, known as Cross-Architecture Knowledge Distillation (CAKD). Despite its potential, the deployment of CAKD in face recognition encounters two challenges: 1) the teacher and student share disparate spatial information for each pixel, obstructing the alignment of feature space, and 2) the teacher network is not trained in the role of a teacher, lacking proficiency in handling distillation-specific knowledge. To surmount these two constraints, 1) we first introduce a Unified Receptive Fields Mapping module (URFM) that maps pixel features of the teacher and student into local features with unified receptive fields, thereby synchronizing the pixel-wise spatial information of teacher and student. Subsequently, 2) we develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge while preserving the model's discriminative capacity. Extensive experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
翻译:Transformer已成为人脸识别任务中的主流选择,但其平台加速能力不足限制了在移动设备上的应用。相比之下,卷积神经网络(CNN)能够利用硬件兼容的加速库。因此,在将知识从基于Transformer的教师模型迁移至基于CNN的学生模型时,保持蒸馏效能至关重要,这一过程称为跨架构知识蒸馏(CAKD)。尽管具有潜力,CAKD在人脸识别中的应用面临两大挑战:1)教师与学生模型在像素级共享不同空间信息,阻碍了特征空间的对齐;2)教师网络未以教师角色进行训练,缺乏处理蒸馏特定知识的能力。为克服这些限制,我们首先提出统一感受野映射模块(URFM),将教师与学生模型的像素特征映射至统一感受野的局部特征,从而同步二者在像素级的空间信息。其次,我们开发了可适配提示教师网络(APT),通过将提示嵌入教师模型,使其既能管理蒸馏特定知识,又可保持模型的判别能力。在主流人脸基准及两个大规模验证集上的大量实验证明了本方法的优越性。