Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models, as they naturally implement event-driven computations while avoiding expensive multiplication operations. In this paper, we develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models demonstrating fast and accurate inference in a novel event-driven CMOS reconfigurable spiking inference accelerator. Experimental results show that a reduced-precision Resnet-18 and VGG-11 SNN models achieves classification accuracy within 1% of the baseline full-precision DNN model within 8 spike timesteps. We also demonstrate an FPGA prototype implementation of the spiking inference accelerator with a throughput of 38.4 giga operations per second (GOPS) consuming 1.54 Watts on PYNQ-Z2 FPGA. This corresponds to 0.6 GOPS per processing element and 2.25,GOPS/DSP slice, which is 2x and 4.5x higher utilisation efficiency respectively compared to the state-of-the-art. Our co-optimisation strategy can be employed to develop deep reduced precision SNN models and port them to resource-efficient event-driven hardware accelerators for edge applications.
翻译:脉冲神经网络(SNNs)作为一种提高机器学习模型能效的可行方案而兴起,其天然实现了事件驱动计算,同时避免了昂贵的乘法运算。本文提出一种硬件-软件协同优化策略,将软件训练的深度神经网络(DNN)转换为低精度脉冲模型,并在一种新型事件驱动CMOS可重构脉冲推理加速器中实现了快速精确的推理。实验结果表明,低精度Resnet-18和VGG-11脉冲神经网络模型在8个脉冲时间步内,其分类精度与基准全精度DNN模型的差距保持在1%以内。我们还展示了该脉冲推理加速器的FPGA原型实现,其在PYNQ-Z2 FPGA上实现了每秒384亿次操作(GOPS)的吞吐量,功耗为1.54瓦。这相当于每个处理单元0.6 GOPS以及每个DSP片2.25 GOPS的能效,相比现有最优技术分别提升了2倍和4.5倍的利用效率。我们的协同优化策略可用于开发深度低精度脉冲神经网络模型,并将其部署到面向边缘应用的高资源效率事件驱动硬件加速器中。