Spiking Neural Networks (SNNs) are expected to be a promising alternative to Artificial Neural Networks (ANNs) due to their strong biological interpretability and high energy efficiency. Specialized SNN hardware offers clear advantages over general-purpose devices in terms of power and performance. However, there's still room to advance hardware support for state-of-the-art (SOTA) SNN algorithms and improve computation and memory efficiency. As a further step in supporting high-performance SNNs on specialized hardware, we introduce FireFly v2, an FPGA SNN accelerator that can address the issue of non-spike operation in current SOTA SNN algorithms, which presents an obstacle in the end-to-end deployment onto existing SNN hardware. To more effectively align with the SNN characteristics, we design a spatiotemporal dataflow that allows four dimensions of parallelism and eliminates the need for membrane potential storage, enabling on-the-fly spike processing and spike generation. To further improve hardware acceleration performance, we develop a high-performance spike computing engine as a backend based on a systolic array operating at 500-600MHz. To the best of our knowledge, FireFly v2 achieves the highest clock frequency among all FPGA-based implementations. Furthermore, it stands as the first SNN accelerator capable of supporting non-spike operations, which are commonly used in advanced SNN algorithms. FireFly v2 has doubled the throughput and DSP efficiency when compared to our previous version of FireFly and it exhibits 1.33 times the DSP efficiency and 1.42 times the power efficiency compared to the current most advanced FPGA accelerators.
翻译:脉冲神经网络(SNN)因其强大的生物可解释性和高能效,被认为是人工神经网络(ANN)的有前景的替代方案。专用SNN硬件在功耗和性能方面相较于通用设备具有明显优势。然而,支持最先进(SOTA)SNN算法的硬件仍有改进空间,尤其是在计算和存储效率方面。为进一步推动专用硬件对高性能SNN的支持,我们提出FireFly v2——一种FPGA SNN加速器,可解决当前SOTA SNN算法中存在的非尖峰操作问题,该问题阻碍了其在现有SNN硬件上的端到端部署。为更有效适配SNN特性,我们设计了一种时空数据流,支持四维并行,并消除了膜电位存储需求,从而实现尖峰处理和生成的实时流水线操作。为提升硬件加速性能,我们开发了基于脉动阵列的高性能尖峰计算引擎作为后端,其工作频率为500-600 MHz。据我们所知,FireFly v2在所有基于FPGA的实现中实现了最高时钟频率。此外,它是首个能够支持非尖峰操作的SNN加速器,而此类操作在先进SNN算法中普遍使用。相较于我们先前版本的FireFly,FireFly v2的吞吐量和DSP效率翻倍,且相较于当前最先进的FPGA加速器,其DSP效率提高1.33倍,能效提高1.42倍。