Spiking Transformers have gained considerable attention because they achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. However, the existing Spiking Transformer architectures, derived from ANNs, exhibit a notable architectural gap, resulting in suboptimal performance compared to their ANN counterparts. Traditional approaches to discovering optimal architectures primarily rely on either manual procedures, which are time-consuming, or Neural Architecture Search (NAS) methods, which are usually expensive in terms of memory footprints and computation time. To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance and energy-efficient Spiking Transformer architectures. Unlike existing training-free NAS methods, which struggle with the non-differentiability and high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Moreover, to enable the search for energy-efficient architectures, we leverage activation patterns during initialization to estimate the energy consumption of Spiking Transformers. Our extensive experiments show that AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets, while significantly reducing energy consumption.
翻译:脉冲Transformer因兼具脉冲神经网络(SNN)的能量效率与Transformer的高容量特性而备受关注。然而,现有源自人工神经网络的脉冲Transformer架构存在显著的结构性差距,导致其性能相较于对应的人工神经网络架构有所不足。传统的最优架构发现方法主要依赖耗时的人工流程或通常内存占用与计算代价高昂的神经架构搜索(NAS)技术。为克服这些局限,我们提出AutoST——一种面向脉冲Transformer的无训练NAS方法,可快速识别高性能且能量高效的脉冲Transformer架构。与现有难以应对SNN非微分性与高度稀疏性的无训练NAS方法不同,我们提出利用浮点运算数(FLOPs)作为性能度量指标,该指标独立于模型计算与训练动态特性,从而与性能呈现更强相关性。此外,为搜索能量高效架构,我们利用初始化阶段的激活模式来估计脉冲Transformer的能耗。大量实验表明,在静态数据集与神经形态数据集上,AutoST模型在显著降低能耗的同时,超越了当前最先进的手工或自动设计的SNN架构。