Spiking Neural Networks (SNNs) have emerged as a biologically inspired alternative to conventional deep networks, offering event-driven and energy-efficient computation. However, their throughput remains constrained by the serial update of neuron membrane states. While many hardware accelerators and Compute-in-Memory (CIM) architectures efficiently parallelize the synaptic operation (W x I) achieving O(1) complexity for matrix-vector multiplication, the subsequent state update step still requires O(N) time to refresh all neuron membrane potentials. This mismatch makes state update the dominant latency and energy bottleneck in SNN inference. To address this challenge, we propose an SRAM-based CIM for SNN with Linear Decay Leaky Integrate-and-Fire (LD-LIF) Neuron that co-optimizes algorithm and hardware. At the algorithmic level, we replace the conventional exponential membrane decay with a linear decay approximation, converting costly multiplications into simple additions while accuracy drops only around 1%. At the architectural level, we introduce an in-memory parallel update scheme that performs in-place decay directly within the SRAM array, eliminating the need for global sequential updates. Evaluated on benchmark SNN workloads, the proposed method achieves a 1.1 x to 16.7 x reduction of SOP energy consumption, while providing 15.9 x to 69 x more energy efficiency, with negligible accuracy loss relative to original decay models. This work highlights that beyond accelerating the (W x I) computation, optimizing state-update dynamics within CIM architectures is essential for scalable, low-power, and real-time neuromorphic processing.
翻译:脉冲神经网络作为一种受生物启发的传统深度网络替代方案,凭借其事件驱动和高效能的计算特性而备受关注。然而,其吞吐量仍受限于神经元膜电位的串行更新过程。尽管许多硬件加速器和存内计算架构能够高效并行化突触运算(W x I),实现矩阵向量乘法的O(1)复杂度,但后续的状态更新步骤仍需O(N)时间来刷新所有神经元膜电位。这种不匹配使得状态更新成为SNN推理中的主要延迟和能耗瓶颈。为应对这一挑战,我们提出了一种面向线性衰减泄漏积分发放神经网络的SRAM存内计算架构,通过算法与硬件协同优化解决该问题。在算法层面,我们采用线性衰减近似替代传统的指数膜电位衰减,将昂贵的乘法运算转换为简单的加法操作,同时精度损失仅约1%。在架构层面,我们提出了一种存内并行更新方案,直接在SRAM阵列中执行原位衰减,消除了全局顺序更新的需求。在基准SNN工作负载上的评估表明,所提方法将SOP能耗降低1.1倍至16.7倍,同时提供15.9倍至69倍的能效提升,且相对于原始衰减模型的精度损失可忽略不计。这项工作表明,除了加速(W x I)计算外,在存内计算架构中优化状态更新动力学对于实现可扩展、低功耗和实时神经形态处理至关重要。