Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability. Inspired by knowledge distillation (KD), recent research has improved the performance of the SNN model with a pre-trained teacher model. However, additional teacher models require significant computational resources, and it is tedious to manually define the appropriate teacher network architecture. In this paper, we explore cost-effective self-distillation learning of SNNs to circumvent these concerns. Without an explicit defined teacher, the SNN generates pseudo-labels and learns consistency during training. On the one hand, we extend the timestep of the SNN during training to create an implicit temporal ``teacher" that guides the learning of the original ``student", i.e., the temporal self-distillation. On the other hand, we guide the output of the weak classifier at the intermediate stage by the final output of the SNN, i.e., the spatial self-distillation. Our temporal-spatial self-distillation (TSSD) learning method does not introduce any inference overhead and has excellent generalization ability. Extensive experiments on the static image datasets CIFAR10/100 and ImageNet as well as the neuromorphic datasets CIFAR10-DVS and DVS-Gesture validate the superior performance of the TSSD method. This paper presents a novel manner of fusing SNNs with KD, providing insights into high-performance SNN learning methods.
翻译:脉冲神经网络(SNNs)因其事件驱动、低功耗的特性以及高度的生物可解释性而受到广泛关注。受知识蒸馏(KD)的启发,近期研究通过使用预训练的教师模型来提升SNN模型的性能。然而,额外的教师模型需要大量的计算资源,且手动定义合适的教师网络架构十分繁琐。本文旨在探索一种高效的SNN自蒸馏学习方法以规避这些问题。该方法无需显式定义教师模型,而是在训练过程中由SNN自身生成伪标签并学习一致性。一方面,我们在训练时延长SNN的时间步长,构建一个隐式的时序“教师”来指导原始“学生”模型的学习,即时序自蒸馏。另一方面,我们利用SNN的最终输出来指导中间阶段弱分类器的输出,即空间自蒸馏。我们所提出的时空自蒸馏(TSSD)学习方法不会引入任何推理开销,并具有出色的泛化能力。在静态图像数据集CIFAR10/100和ImageNet,以及神经形态数据集CIFAR10-DVS和DVS-Gesture上进行的大量实验验证了TSSD方法的优越性能。本文提出了一种融合SNN与KD的新颖方式,为高性能SNN学习方法提供了新的思路。