Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.
翻译:将大规模神经网络的所有权重片上存储,对于片上容量有限的基于SRAM的存内计算(SRAM-CIM)仍是一大挑战。先前非易失性SRAM-CIM(nvSRAM-CIM)通过在高效SRAM-CIM顶部集成高密度单电平ReRAM来存储权重,从而消除片外存储器访问,解决了该问题。然而,先前的SL-nvSRAM-CIM在应对增多的SL-ReRAM数量时存在扩展性差和计算效率有限的问题。为克服这些挑战,本文提出一种面向大型神经网络模型的超高密度三电平ReRAM辅助存内计算非易失性SRAM(TL-nvSRAM-CIM)方案。采用集群式n选通管-n-ReRAM(cluster-nSnRs)实现可靠权重恢复,并消除直流功耗。此外,提出一种具有差分计算机制的三值SRAM-CIM方法,可在保持高神经网络精度的同时实现高效三值MAC运算。与现有最优方案相比,所提出的TL-nvSRAM-CIM实现了7.8倍的存储密度提升。此外,与基线SRAM-CIM和ReRAM-CIM设计相比,TL-nvSRAM-CIM的能效分别提升至2.9倍和1.9倍。