Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.
翻译:脉冲神经网络(SNNs)因其独特的基于脉冲的事件驱动(即脉冲驱动)范式,提供了一种高能效的深度学习方案。本文通过提出的脉冲驱动Transformer将脉冲驱动范式融入Transformer中,该模型具有四个独特特性:1) 事件驱动性——当Transformer输入为零时不触发任何计算;2) 二进制脉冲通信——与脉冲矩阵相关的所有矩阵乘法可转化为稀疏加法;3) 在词元和通道维度均具有线性复杂度的自注意力机制;4) 脉冲形式的Query、Key和Value之间的运算为掩码与加法操作。综上,脉冲驱动Transformer仅包含稀疏加法运算。为此,我们设计了新型脉冲驱动自注意力(SDSA),该机制仅利用掩码与加法操作而无需任何乘法,因此其计算能耗较传统自注意力降低高达87.2倍。特别地,在SDSA中,Query、Key与Value间的矩阵乘法被设计为掩码操作。此外,我们将标准Transformer中所有残差连接重新排列至激活函数之前,以确保所有神经元传输二进制脉冲信号。实验表明,脉冲驱动Transformer在ImageNet-1K上可实现77.1%的Top-1准确率,这标志着SNN领域的最优结果。源代码已发布于https://github.com/BICLab/Spike-Driven-Transformer。