AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies like non-volatile memory, which offers fast-read memory access and a small cell area. Despite these advantages, non-volatile memory's slow write memory access and high write energy consumption prevent it from surpassing SRAM performance in AI applications with extensive memory access requirements. Some research has also investigated eDRAM as an area-efficient on-chip memory with similar access times as SRAM. Still, refresh power remains a concern, leaving the trade-off between performance, area, and power consumption unresolved. To address this issue, our paper presents a novel mixed CMOS cell memory design that balances performance, area, and energy efficiency for AI memory by combining SRAM and eDRAM cells. We consider the proportion ratio of one SRAM and seven eDRAM cells in the memory to achieve area reduction using mixed CMOS cell memory. Additionally, we capitalize on the characteristics of DNN data representation and integrate asymmetric eDRAM cells to lower energy consumption. To validate our proposed MCAIMem solution, we conduct extensive simulations and benchmarking against traditional SRAM. Our results demonstrate that MCAIMem significantly outperforms these alternatives in terms of area and energy efficiency. Specifically, our MCAIMem can reduce the area by 48\% and energy consumption by 3.4$\times$ compared to SRAM designs, without incurring any accuracy loss.
翻译:AI芯片通常采用SRAM存储器作为缓冲区,因其可靠性和高速特性可助力高性能表现。然而SRAM成本高昂,且需要较大的面积和功耗开销。已有研究探索用非易失性存储器等新兴技术替代SRAM,这类技术具有快速读取访问和小单元面积的优势。尽管具备这些优点,但非易失性存储器写入速度慢、写入能耗高,使其在需大量内存访问的AI应用中难以超越SRAM的性能表现。另有研究探索将eDRAM用作与SRAM访问时间相近的面积高效型片上存储器,但其刷新功耗始终是待解决的问题,导致性能、面积与功耗之间的权衡尚未实现突破。针对这一难题,本文提出一种创新的混合CMOS单元存储设计,通过结合SRAM与eDRAM单元,为AI存储器实现性能、面积与能效的平衡。我们采用存储器中一个SRAM单元与七个eDRAM单元的比例配置,通过混合CMOS单元存储实现面积缩减。此外,我们利用深度神经网络数据表征的特性,集成非对称eDRAM单元以降低能耗。为验证所提出的MCAIMem方案,我们针对传统SRAM开展大规模仿真与基准测试。结果表明,MCAIMem在面积和能效方面显著优于现有方案:与SRAM设计相比,MCAIMem可在不产生任何精度损失的前提下,将面积缩减48%,能耗降低3.4倍。