Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier Transform, Wavelet Transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies ($0.4\%$-$1.5\%$), higher running speed ($9\%$-$51\%$ for training and $19\%$-$70\%$ for inference), reduced theoretical energy consumption ($20\%$-$25\%$), and reduced GPU memory usage ($4\%$-$26\%$), compared to the standard spikformer. Our result indicates the continuous refinement of new Transformers, that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet Transform), is promising.
翻译:为了在保持较高准确率和较低计算成本的同时实现高效视觉分类,研究者近期提出了结合生物合理脉冲神经网络(SNN)与人工Transformer的能量高效型Spikformer,其中脉冲自注意力(SSA)机制发挥了关键作用。然而,在稀疏脉冲形式的计算方式中,自注意力并非始终必要。基于傅里叶变换、小波变换及其组合均采用一组基函数进行信息变换的关键假设,本文创新性地使用脉冲形式的傅里叶变换、小波变换及二者组合(采用固定三角基或小波基)替代了原始SSA(利用查询和键计算动态基)。由此,我们提出了基于傅里叶或小波的Spikformer(FWformer),并在包含静态图像与事件驱动视频数据集的视觉分类任务中进行了验证。与标准Spikformer相比,FWformer能够达到相当甚至更高的准确率(提升0.4%-1.5%)、更高的运行速度(训练速度提升9%-51%,推理速度提升19%-70%)、更低的理论能耗(降低20%-25%)以及更少的GPU内存占用(降低4%-26%)。我们的结果表明,受生物学发现(脉冲形式)或信息论(傅里叶或小波变换)启发而持续改进的新型Transformer具有广阔前景。