NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities

The exponential growth of artificial intelligence (AI) applications has exposed the inefficiency of conventional von Neumann architectures, where frequent data transfers between compute units and memory create significant energy and latency bottlenecks. Analog Computing-in-Memory (ACIM) addresses this challenge by performing multiply-accumulate (MAC) operations directly in the memory arrays, substantially reducing data movement. However, designing robust ACIM accelerators requires accurate modeling of device- and circuit-level non-idealities. In this work, we present NeuroSim V1.5, introducing several key advances: (1) seamless integration with TensorRT's post-training quantization flow enabling support for more neural networks including transformers, (2) a flexible noise injection methodology built on pre-characterized statistical models, making it straightforward to incorporate data from SPICE simulations or silicon measurements, (3) expanded device support including emerging non-volatile capacitive memories, and (4) up to 6.5x faster runtime than NeuroSim V1.4 through optimized behavioral simulation. The combination of these capabilities uniquely enables systematic design space exploration across both accuracy and hardware efficiency metrics. Through multiple case studies, we demonstrate optimization of critical design parameters while maintaining network accuracy. By bridging high-fidelity noise modeling with efficient simulation, NeuroSim V1.5 advances the design and validation of next-generation ACIM accelerators. All NeuroSim versions are available open-source at https://github.com/neurosim/NeuroSim.

翻译：人工智能（AI）应用的指数级增长暴露了传统冯·诺依曼架构的低效性，其中计算单元与存储器之间的频繁数据传输造成了显著的能耗与延迟瓶颈。模拟存内计算（ACIM）通过在存储器阵列中直接执行乘累加（MAC）运算来应对这一挑战，从而大幅减少数据移动。然而，设计稳健的ACIM加速器需要对器件级和电路级的非理想特性进行精确建模。本工作中，我们提出了NeuroSim V1.5，引入了若干关键改进：（1）与TensorRT的训练后量化流程无缝集成，从而支持包括Transformer在内的更多神经网络；（2）基于预表征统计模型的灵活噪声注入方法，使得整合来自SPICE仿真或硅片测量数据变得简便；（3）扩展了器件支持范围，包括新兴的非易失性电容存储器；（4）通过优化的行为级仿真，运行速度相比NeuroSim V1.4提升高达6.5倍。这些功能的结合独特地实现了跨精度与硬件效率指标的系统化设计空间探索。通过多个案例研究，我们展示了在保持网络精度的同时优化关键设计参数的过程。通过将高保真噪声建模与高效仿真相结合，NeuroSim V1.5推动了下一代ACIM加速器的设计与验证。所有NeuroSim版本均在https://github.com/neurosim/NeuroSim开源提供。