Artificial intelligence (AI) models are currently driven by a significant upscaling of their complexity, with massive matrix-multiplication workloads representing the major computational bottleneck. In-memory computing (IMC) architectures are proposed to avoid the von Neumann bottleneck. However, both digital/binary-based and analog IMC architectures suffer from various limitations, which significantly degrade the performance and energy efficiency gains. This work proposes OISMA, an energy-efficient IMC architecture that utilizes the computational simplicity of a quasi-stochastic computing (SC) domain (bent-pyramid (BP) system) while keeping the same efficiency, scalability, and productivity of digital memories. OISMA converts normal memory read operations into in situ stochastic multiplication operations with a negligible cost. An accumulation periphery then accumulates the output multiplication bitstreams, achieving the matrix multiplication (MatMul) functionality. A 4-kB 1T1R OISMA array was implemented using a commercial 180-nm technology node and in-house resistive random-access memory (RRAM) technology. At 50 MHz, it achieves 0.789 TOPS/W and 3.98 GOPS/mm2 for energy and area efficiency, respectively, occupying an effective computing area of 0.804241 mm2. Scaling OISMA to 22-nm technology shows a significant improvement of two orders of magnitude in energy efficiency and one order of magnitude in area efficiency, compared to dense MatMul IMC architectures.
翻译:人工智能(AI)模型目前正通过大幅提升其复杂性来驱动发展,其中大规模矩阵乘法工作负载构成了主要的计算瓶颈。内存计算(IMC)架构被提出以规避冯·诺依曼瓶颈。然而,基于数字/二进制和模拟的IMC架构均面临各种限制,这显著降低了其性能和能效增益。本文提出OISMA,一种能效型IMC架构,它利用准随机计算(SC)域(弯金字塔(BP)系统)的计算简洁性,同时保持数字存储器的相同效率、可扩展性和生产力。OISMA将常规内存读取操作以极低成本转换为原位随机乘法操作。随后,累加外围电路对输出的乘法比特流进行累积,实现矩阵乘法(MatMul)功能。采用商用180nm工艺节点与自主研发的阻变随机存取存储器(RRAM)技术实现了一个4kB的1T1R型OISMA阵列。在50MHz频率下,其能效达到0.789 TOPS/W,面积效率达到3.98 GOPS/mm²,有效计算面积占用为0.804241 mm²。将OISMA扩展至22nm工艺技术后,与稠密MatMul IMC架构相比,其能效提升两个数量级,面积效率提升一个数量级。