In the training and inference of spiking neural networks (SNNs), direct training and lightweight computation methods have been orthogonally developed, aimed at reducing power consumption. However, only a limited number of approaches have applied these two mechanisms simultaneously and failed to fully leverage the advantages of SNN-based vision transformers (ViTs) since they were originally designed for convolutional neural networks (CNNs). In this paper, we propose AT-SNN designed to dynamically adjust the number of tokens processed during inference in SNN-based ViTs with direct training, wherein power consumption is proportional to the number of tokens. We first demonstrate the applicability of adaptive computation time (ACT), previously limited to RNNs and ViTs, to SNN-based ViTs, enhancing it to discard less informative spatial tokens selectively. Also, we propose a new token-merge mechanism that relies on the similarity of tokens, which further reduces the number of tokens while enhancing accuracy. We implement AT-SNN to Spikformer and show the effectiveness of AT-SNN in achieving high energy efficiency and accuracy compared to state-of-the-art approaches on the image classification tasks, CIFAR10, CIFAR-100, and TinyImageNet. For example, our approach uses up to 42.4% fewer tokens than the existing best-performing method on CIFAR-100, while conserving higher accuracy.
翻译:在脉冲神经网络(SNN)的训练与推理过程中,直接训练方法与轻量计算机制作为正交发展的研究方向,均以降低功耗为目标。然而,目前仅有少数方法能同时应用这两种机制,且由于现有方案最初为卷积神经网络(CNN)设计,未能充分发挥基于SNN的视觉Transformer(ViT)的优势。本文提出AT-SNN,该框架通过直接训练实现基于SNN的ViT在推理过程中动态调整处理令牌数量,其功耗与令牌数量成正比。我们首先验证了自适应计算时间(ACT)机制在基于SNN的ViT中的适用性——该机制此前仅应用于循环神经网络(RNN)和ViT,并通过增强使其能够选择性丢弃信息量较低的空间令牌。此外,我们提出一种基于令牌相似度的新型令牌融合机制,在进一步减少令牌数量的同时提升模型精度。我们将AT-SNN集成至Spikformer架构,通过在CIFAR-10、CIFAR-100和TinyImageNet图像分类任务上与前沿方法对比,验证了AT-SNN在实现高能效与高精度方面的有效性。例如在CIFAR-100数据集上,本方法在保持更高精度的同时,比现有最优方法减少高达42.4%的令牌使用量。