SupraSNN: Exploiting Synapse-Level Parallelism in Spiking Neural Network Accelerators through Co-Optimized Mapping and Scheduling

Spiking Neural Networks (SNNs) offer a brain-inspired path toward highly efficient computation, but their practical deployment is constrained by the challenge of managing and executing their massive parallelism on physical hardware. This problem mirrors the historical challenge in processor design of moving beyond serial execution, a barrier broken by superscalar architectures that dispatch multiple instructions to parallel functional units. Drawing inspiration from this paradigm, we introduce a hardware-software co-design framework that treats synaptic events as parallelizable micro-operations. We present SupraSNN, a superscalar-inspired architecture that achieves high synapse-level parallelism by physically decoupling synaptic and neuronal computations. Within this architecture, a Multi-Cast Tree routes spike data to multiple parallel Synapse Processing Units serve as the computational pipelines, while a Merge Tree consolidates distributed results for processing by a unified Neuron Unit--deliberately centralizing complex neuron state dynamics to mitigate hardware overhead and resource duplication. The efficacy of this architecture is enabled by a sophisticated partitioning and scheduling framework that first maps the SNN onto hardware respecting memory constraints, then heuristic scheduling determines the synaptic execution order, maximizing throughput and resource utilization. Implementing a feedforward SNN trained on MNIST (93.44% accuracy), SupraSNN achieves 149 $μs$ inference latency and 0.025 mJ per image (0.276 nJ per synapse) on the Xilinx Zynq XC7Z020 FPGA--delivering 47.6% lower latency and 5.6$\times$ better energy efficiency than prior FPGA-based SNN accelerators. Beyond vision tasks, a recurrent SNN on the Spiking Heidelberg Dataset (71.82% accuracy) achieves 1.41 ms latency and 0.77 mJ per sample on XC7Z030.

翻译：摘要：脉冲神经网络（SNN）提供了一条受大脑启发的、通向高效计算的路径，但其实际部署受限于在物理硬件上管理和执行其大规模并行性的挑战。这个问题类似于处理器设计历史上超越串行执行的挑战，而这一障碍被超标量架构所突破——该架构将多条指令分派到并行功能单元。受此范式启发，我们引入了一种软硬件协同设计框架，将突触事件视为可并行的微操作。我们提出了SupraSNN，一种受超标量启发的架构，通过物理上解耦突触和神经元的计算，实现了高突触级并行性。在该架构中，多播树将脉冲数据路由到多个并行突触处理单元（作为计算流水线），而合并树则汇集分布式结果，交由统一的神经元单元处理——该单元特意集中处理复杂的神经元状态动态，以降低硬件开销和资源重复。这一架构的有效性依赖于一个精细化的划分与调度框架：首先将SNN映射到遵循内存约束的硬件上，然后通过启发式调度确定突触执行顺序，最大化吞吐量和资源利用率。在MNIST上训练的（准确率93.44%）前馈SNN中，SupraSNN在Xilinx Zynq XC7Z020 FPGA上实现了149微秒的推理延迟和每幅图像0.025毫焦（每个突触0.276纳焦）的能耗——相比先前基于FPGA的SNN加速器，延迟降低47.6%，能效提升5.6倍。在视觉任务之外，Spiking Heidelberg数据集上的循环SNN（准确率71.82%）在XC7Z030上实现了每样本1.41毫秒延迟和0.77毫焦能耗。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

脉冲神经网络在普适计算中的潜力：综述与新视角

专知会员服务

16+阅读 · 2025年6月4日

【CVPR2025】STAA-SNN：用于脉冲神经网络的时空注意力聚合器

专知会员服务

10+阅读 · 2025年3月5日

【普林斯顿博士论文】深度学习加速器的编译器支持：端到端评估与数据访问优化

专知会员服务

19+阅读 · 2025年1月7日

脉冲神经网络的架构原理、数据集和训练方法

专知会员服务

23+阅读 · 2024年8月13日