Deep neural networks (DNNs) have achieved state-of-the-art performance across diverse domains. However, typical Von Neumann compute paradigms face severe memory bottlenecks. Emerging near-memory and compute-in-memory approaches alleviate this but incur significant peripheral overhead. Computational Random Access Memory (CRAM) based on MRAM enables in-situ logic without peripheral overhead, offering a dense, energy-efficient solution. However, probabilistic MRAM switching induces gate-level errors that limit the scalability and reliability of CRAM for accelerating DNN. Moreover, the large number of sequential MRAM writes severely constrains CRAM throughput. To address these challenges, we propose an error-resilient CRAM (CRAM-ER) architecture for scalable in-memory matrix-vector multiplications (MVMs). Our error-aware hardware-software co-design framework leverages a hybrid spintronic-CRAM + CMOS adder-tree architecture to mitigate the impact of device-level errors, demonstrating MVM functionality with high area and energy efficiency. We further develop an error-aware model fine-tuning and fine-grained error correction for enhanced error resilience. Evaluations of the CMOS+spintronic hybrid architecture on DNN benchmarks show near-lossless accuracy while reducing CRAM latency by up to 2 orders of magnitude, outperforming CPU/GPU+high-bandwidth DRAM in both energy efficiency and energy-delay product.
翻译:深度神经网络(DNN)已在多个领域实现了最先进的性能。然而,传统的冯·诺依曼计算范式面临严重的内存瓶颈。新兴的近内存和存内计算方法缓解了这一问题,但引入了显著的外围开销。基于MRAM的计算随机存取存储器(CRAM)能够原位执行逻辑运算且无外围开销,提供了高密度、高能效的解决方案。然而,概率性的MRAM开关操作会诱发门级错误,限制了CRAM在加速DNN时的可扩展性和可靠性。此外,大量的顺序MRAM写入操作严重制约了CRAM的吞吐量。为应对这些挑战,我们提出了一种面向可扩展内存矩阵向量乘法(MVM)的高容错CRAM(CRAM-ER)架构。我们的错误感知软硬件协同设计框架利用混合自旋电子-CRAM + CMOS加法器树架构来缓解器件级错误的影响,在实现高面积与能量效率的同时展示了MVM功能。我们进一步开发了错误感知的模型微调与细粒度纠错技术,以增强错误鲁棒性。在DNN基准测试上对CMOS+自旋电子混合架构的评估显示,其准确率近乎无损,同时将CRAM延迟降低了多达2个数量级,且在能量效率与能量延迟积方面均优于CPU/GPU+高带宽DRAM方案。