Spiking Neural Networks (SNNs), with brain-inspired structure using discrete spikes instead of continuous activations, are gaining attention for their efficient processing on neuromorphic chips. While current SNN hardware accelerators often prioritize temporal spike sparsity, exploiting sparse synaptic weights offers significant untapped potential for even greater efficiency. To address this, we propose FireFly-S, a Sparse extension of the FireFly series. This co-optimized software-hardware design focuses on leveraging dual-side sparsity for acceleration. On the software side, we propose a algorithmic optimization framework that combines gradient rewiring for pruning and modified Learned Step Size Quantization (LSQ) for SNNs, achieving a weight sparsity exceeding 85\% and enabling efficient 4-bit quantization with negligible accuracy loss. On the hardware side, we present an efficient dual-side sparsity detector employing a Bitmap-based sparse decoding logic to pinpoint the positions of non-zero weights and input spikes. The logic allows for direct bypassing of redundant computations, thereby enhancing computational efficiency. Different from the overlay architecture adopted by previous FireFly series, we adopt a parametric spatial architecture with inter-layer pipelining that can fully exploit the fine-grained programmability and reconfigurability of Field-Programmable Gate Arrays (FPGAs), enabling fast deployment for various models. A spatial-temporal dataflow is also proposed to support such inter-layer pipelining and avoid long-term temporal dependencies. In experiments conducted on the MNIST, DVS-Gesture and CIFAR-10 datasets, the FireFly-S model achieves 85--95\% sparsity with 4-bit quantization and the hardware accelerator effectively leverages the dual-side sparsity, delivering performance metrics of 10,047~FPS/W on MNIST, 3,683~FPS/W on DVS-Gesture, and 2,327~FPS/W on CIFAR-10.
翻译:脉冲神经网络(SNNs)采用受大脑启发的结构,使用离散脉冲而非连续激活,因其在神经形态芯片上的高效处理能力而受到关注。当前SNN硬件加速器通常优先考虑时间脉冲稀疏性,但利用稀疏的突触权重仍具有显著未开发的潜力,可进一步提升效率。为此,我们提出FireFly-S,即FireFly系列的稀疏扩展版本。这一软硬件协同优化设计专注于利用双边稀疏性实现加速。在软件层面,我们提出一种算法优化框架,结合用于剪枝的梯度重连技术以及改进的适用于SNN的Learned Step Size Quantization(LSQ)方法,实现了超过85%的权重稀疏度,并支持高效的4位量化,且精度损失可忽略不计。在硬件层面,我们设计了一种高效的双边稀疏性检测器,采用基于位图的稀疏解码逻辑来精确定位非零权重和输入脉冲的位置。该逻辑可直接绕过冗余计算,从而提升计算效率。不同于先前FireFly系列采用的覆盖式架构,我们采用了一种参数化空间架构,结合层间流水线技术,能够充分利用现场可编程门阵列(FPGs)的细粒度可编程性和可重构性,实现多种模型的快速部署。我们还提出了一种时空数据流以支持此类层间流水线并避免长期时间依赖性。在MNIST、DVS-Gesture和CIFAR-10数据集上进行的实验中,FireFly-S模型实现了85-95%的稀疏度与4位量化,且硬件加速器有效利用了双边稀疏性,在MNIST上达到10,047 FPS/W,在DVS-Gesture上达到3,683 FPS/W,在CIFAR-10上达到2,327 FPS/W的性能指标。