FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency. With the introduction of the backpropagation algorithm and surrogate gradient, the structure of spiking neural networks has become more complex, and the performance gap with artificial neural networks has gradually decreased. However, most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements, which significantly restricts the development of SNNs. They do not delve into the arithmetic operations between the binary spikes and synaptic weights or assume unlimited on-chip RAM resources by using overly expensive devices on small tasks. To improve arithmetic efficiency, we analyze the neural dynamics of spiking neurons, generalize the SNN arithmetic operation to the multiplex-accumulate operation, and propose a high-performance implementation of such operation by utilizing the DSP48E2 hard block in Xilinx Ultrascale FPGAs. To improve memory efficiency, we design a memory system to enable efficient synaptic weights and membrane voltage memory access with reasonable on-chip RAM consumption. Combining the above two improvements, we propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly). FireFly is the first SNN accelerator that incorporates DSP optimization techniques into SNN synaptic operations, achieving a balanced resource consumption between LUTs and DSPs. FireFly is implemented on several FPGA edge devices with limited resources but still guarantees a peak performance of 5.53TSOP/s at 300MHz. As a lightweight accelerator, FireFly achieves the highest computational density efficiency compared with existing research using large FPGA devices.

翻译：脉冲神经网络（SNN）因其强大的生物可解释性和高能效而被广泛应用。随着反向传播算法和替代梯度的引入，脉冲神经网络的结构变得更加复杂，其与人工神经网络的性能差距逐渐缩小。然而，大多数面向现场可编程门阵列（FPGA）的SNN硬件实现无法满足算术或内存效率需求，这严重限制了SNN的发展。这些实现既未深入探讨二值脉冲与突触权重之间的算术运算，也未通过在小规模任务中使用过度昂贵的器件来假设无限量的片上RAM资源。为提升算术效率，我们分析了脉冲神经元的神经动力学特性，将SNN算术运算泛化为乘累加操作，并利用Xilinx Ultrascale系列FPGA中的DSP48E2硬核模块提出了一种高性能实现方案。为提升内存效率，我们设计了一种内存系统，能够在合理的片上RAM消耗下实现高效的突触权重和膜电位存取。结合上述两项改进，我们提出了一种能够即时处理发放神经元所产生脉冲的FPGA加速器（FireFly）。FireFly是首个将DSP优化技术融入SNN突触运算的加速器，实现了LUT与DSP资源消耗的均衡。该加速器在资源有限的多个FPGA边缘设备上实现，并仍能保证在300MHz频率下达到5.53TSOP/s的峰值性能。作为轻量级加速器，FireFly相较于现有使用大型FPGA器件的研究，实现了最高的计算密度效率。