Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer
翻译:脉冲Transformer将脉冲神经网络(SNNs)与Transformer架构相结合,因其在能效和性能方面的潜力而备受关注。然而,该领域现有模型性能仍不理想。我们提出若干创新以提升性能:i) 提出一种专为SNNs设计的新型脉冲形式Q-K注意力机制,通过线性复杂度的二值向量高效建模token或通道维度的重要性;ii) 将显著提升大脑和人工神经网络性能的分层结构引入脉冲Transformer,以获取多尺度脉冲表征;iii) 设计一种带变形捷径的通用且强大的分块嵌入模块,专门用于脉冲Transformer。综合上述创新,我们开发了QKFormer——一种基于Q-K注意力机制、支持直接训练的分层脉冲Transformer。在多个主流数据集上,QKFormer的性能显著优于现有最先进的SNN模型。值得注意的是,在与Spikformer(参数66.34 M,准确率74.81%)规模相当的条件下,QKFormer(参数64.96 M)在ImageNet-1K上实现了突破性的85.65% top-1准确率,远超Spikformer达10.84%。据我们所知,这是首次直接训练的SNNs在ImageNet-1K上准确率突破85%。代码与模型已开源至https://github.com/zhouchenlin2096/QKFormer。