The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $25.8\times$ and $50.0\times$ lower execution time and $23.2\times$ and $33.1\times$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.
翻译:数据为中心算法(特别是人工智能与机器学习)的广泛采用,暴露了集中式处理基础设施的局限性,推动了向边缘计算的转变。这能效方面提出了严格约束,而传统的冯·诺依曼架构难以满足这些要求。内存计算范式因其对可用内存带宽的高效利用而成为优越的候选方案。然而,现有的内存计算解决方案需要较高的实现成本,且从软件集成角度来看缺乏灵活性。本研究提出了一种新颖、软件友好、通用且低集成成本的近内存计算方法,为下一代边缘计算节点采用基于内存计算的系统铺平了道路。提出了两种架构变体NM-Caesar和NM-Carus,针对面积效率、性能和灵活性之间的不同权衡进行了特性表征,覆盖了广泛的嵌入式微控制器。布局后仿真表明,与在最先进的RISC-V CPU上执行相同任务相比,系统级执行时间分别降低达$25.8\times$和$50.0\times$,能效分别提升$23.2\times$和$33.1\times$。NM-Carus在8位矩阵乘法中实现了$306.7$ GOPS/W的峰值能效,超越了近期最先进的内存内及近内存电路。