E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

This work presents E-ReCON, a 16 Kb energy and resource-efficient digital compute-in-memory (DCIM) macro based on a compact 3T1R ReRAM bitcell for edge-AI inference. The proposed bitcell occupies only 0.85 um^2 and supports reliable AND-based in-memory multiplication for both conventional convolutional neural network (CNN) and spiking neural network (SNN) workloads. To reduce accumulation overhead, a novel interleaved 10T/28T adder tree is introduced, reducing transistor count and power consumption by 37% and 28%, respectively, compared to a conventional 28T RCA-based design. Implemented in 65 nm CMOS at 1.2 V, the proposed macro achieves a minimum latency of 0.48 ns, throughput of 2.31-3.1 TOPS, and energy efficiency of up to 419 TOPS/W. When evaluated on LeNet-5, AlexNet, and CNN-8 models, the macro achieves 97.81%, 93.23%, and 96.51% accuracy on MNIST/A-Z, CIFAR10, and SVHN datasets, respectively. In addition, 40% pruning preserves nearly 99.8% of the original accuracy while reducing MAC operations and computation cycles. For SNN-oriented workloads, the proposed AND-type bitcell efficiently supports spike-weight multiplication with low switching activity, where the 2A2W configuration achieves accuracy close to the FP32 baseline across VGG-8, VGG-16, and ResNet-18 networks on CIFAR-10, CIFAR-100, and ImageNet-1K datasets. Compared to prior ADC-based ReRAM-CIM designs, the proposed architecture improves latency and energy efficiency by nearly 30-40% while maintaining robust operation under full PVT and ReRAM variability. Overall, E-ReCON provides a scalable, low-latency, and energy-efficient nvCIM platform for next-generation edge-AI, IoT, biomedical sensing, and neuromorphic applications.

翻译：本文提出E-ReCON——一种基于紧凑型3T1R ReRAM存储单元、适用于边缘AI推理的16 Kb高能效与资源高效数字存内计算（DCIM）宏单元。所提存储单元仅占0.85 μm²，支持基于AND操作的可信存内乘法，可同时支持常规卷积神经网络（CNN）与脉冲神经网络（SNN）工作负载。为降低累加开销，本文引入新型交错式10T/28T加法树，相较于传统基于28T脉动进位加法器（RCA）的设计，晶体管数量与功耗分别降低37%和28%。该宏单元采用65 nm CMOS工艺在1.2 V电压下实现，最低延迟达0.48 ns，吞吐量为2.31-3.1 TOPS，最高能效达419 TOPS/W。在LeNet-5、AlexNet及CNN-8模型上，其在MNIST/A-Z、CIFAR10和SVHN数据集上的准确率分别达97.81%、93.23%和96.51%。此外，40%剪枝率下仍可保留近99.8%的原始准确率，同时减少乘累加（MAC）操作与计算周期。针对SNN工作负载，所提AND型存储单元以低翻转活动高效支持脉冲-权重乘法，其中2A2W配置在VGG-8、VGG-16及ResNet-18网络上对CIFAR-10、CIFAR-100及ImageNet-1K数据集的准确率接近FP32基线。相较于先前基于ADC的ReRAM-CIM设计，本文架构延迟与能效提升近30-40%，且在完整PVT波动及ReRAM变异性下保持稳健运行。综上，E-ReCON为下一代边缘AI、物联网、生物医学传感及神经形态应用提供了可扩展、低延迟、高能效的nvCIM平台。