Spiking Neural Networks (SNNs) offer a promising solution for energy-efficient edge intelligence; however, their hardware deployment is constrained by memory overhead, inefficient scaling operations, and limited parallelism. This work proposes L-SPINE, a low-precision SIMD-enabled spiking neural compute engine for efficient edge inference. The architecture features a unified multi-precision datapath supporting 2-bit, 4-bit, and 8-bit operations, leveraging a multiplier-less shift-add model for neuron dynamics and synaptic accumulation. Implemented on an AMD VC707 FPGA, the proposed neuron requires only 459 LUTs and 408 FFs, achieving a critical delay of 0.39 ns and 4.2 mW power. At the system level, L-SPINE achieves 46.37K LUTs, 30.4K FFs, 2.38 ms latency, and 0.54 W power. Compared to CPU and GPU platforms, it reduces inference latency from seconds to milliseconds, achieving an up to three orders-of-magnitude improvement in energy efficiency. Quantisation analysis shows that INT2/INT4 configurations significantly reduce memory footprint with minimal accuracy loss. These results establish L-SPINE as a scalable and efficient solution for real-time edge SNN deployment.
翻译:脉冲神经网络(SNN)为能效边缘智能提供了有前景的解决方案;然而,其硬件部署仍受限于内存开销、低效缩放操作以及有限的并行性。本文提出L-SPINE——一种支持低精度SIMD的脉冲神经计算引擎,用于高效边缘推理。该架构采用统一的混合精度数据通路,支持2比特、4比特和8比特运算,并利用基于无乘法器移位-累加模型实现神经元动态与突触积累。在AMD VC707 FPGA上实现后,所提神经元仅需459个LUT和408个FF,关键路径延迟为0.39纳秒,功耗为4.2毫瓦。在系统层面,L-SPINE占用46.37K个LUT、30.4K个FF,延迟为2.38毫秒,功耗为0.54瓦。与CPU和GPU平台相比,其推理延迟从秒级降至毫秒级,能效提升高达三个数量级。量化分析表明,INT2/INT4配置在精度损失极小的前提下显著降低了内存占用。上述结果确立了L-SPINE作为实时边缘SNN部署中一种可扩展且高效的解决方案。