FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency. With the introduction of the backpropagation algorithm and surrogate gradient, the structure of spiking neural networks has become more complex, and the performance gap with artificial neural networks has gradually decreased. However, most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements, which significantly restricts the development of SNNs. They do not delve into the arithmetic operations between the binary spikes and synaptic weights or assume unlimited on-chip RAM resources by using overly expensive devices on small tasks. To improve arithmetic efficiency, we analyze the neural dynamics of spiking neurons, generalize the SNN arithmetic operation to the multiplex-accumulate operation, and propose a high-performance implementation of such operation by utilizing the DSP48E2 hard block in Xilinx Ultrascale FPGAs. To improve memory efficiency, we design a memory system to enable efficient synaptic weights and membrane voltage memory access with reasonable on-chip RAM consumption. Combining the above two improvements, we propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly). FireFly is the first SNN accelerator that incorporates DSP optimization techniques into SNN synaptic operations. FireFly is implemented on several FPGA edge devices with limited resources but still guarantees a peak performance of 5.53TOP/s at 300MHz. As a lightweight accelerator, FireFly achieves the highest computational density efficiency compared with existing research using large FPGA devices.

翻译：脉冲神经网络（SNN）因其强生物可解释性和高能效而被广泛应用。随着反向传播算法和替代梯度的引入，脉冲神经网络的结构日益复杂，其与人工神经网络的性能差距也逐渐缩小。然而，现有大多数面向现场可编程门阵列（FPGA）的SNN硬件实现难以满足算力或内存效率需求，这严重制约了SNN的发展。这些研究未深入探究二值脉冲与突触权重之间的算术运算，或通过在小任务中使用过于昂贵的器件而假设无限片上RAM资源。为提升算力效率，我们分析了脉冲神经元的神经动力学，将SNN算术运算泛化为乘法累加操作，并利用Xilinx Ultrascale系列FPGA中的DSP48E2硬核模块提出了该操作的高性能实现方案。为提升内存效率，我们设计了一套存储系统，在合理的片上RAM消耗下实现高效的突触权重与膜电压存储访问。结合上述两项改进，我们提出了一种能够即时处理激发神经元所产生脉冲的FPGA加速器（FireFly）。FireFly是首个将DSP优化技术融入SNN突触操作的加速器。该加速器已在多款资源受限的FPGA边缘设备上实现，并在300MHz频率下仍能保证5.53 TOP/s的峰值性能。作为一种轻量级加速器，FireFly在计算密度效率上超越了现有使用大型FPGA器件的研究成果。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html