With the rapid growth of deep neural networks (DNNs), compute-in-memory (CIM) has emerged as a promising energy-efficient paradigm for accelerating multiply-and-accumulate (MAC) operations. Yet, current CIM architectures are largely limited to dot-product computations and struggle to efficiently support general-purpose matrix operations, such as transpose, element-wise addition, and multiplication. This work presents a 3D-integrated, memory-on-memory SRAM-eDRAM hybrid CIM architecture, implemented in GlobalFoundries 22~nm FDSOI technology, capable of performing general matrix operations directly within the memory crossbar with 4-bit precision. By leveraging a specialized transpose-based architecture, in-memory arithmetic operations, peripheral-aware design, and 3D SRAM--eDRAM integration, the proposed architecture balances latency, energy efficiency, and compute density for general purpose matrix operations while remaining compatible with the conventional CIM dot product architectures. Overall, this memory-on-memory CIM framework generalizes CIM beyond dot products, enabling versatile matrix processing and paving the way for broader applications in AI acceleration and general-purpose high performance computing.
翻译:随着深度神经网络(DNN)的快速发展,存内计算(CIM)已成为加速乘累加(MAC)操作的一种前景广阔的节能范式。然而,当前的CIM架构主要局限于点积运算,难以高效支持转置、逐元素加法和乘法等通用矩阵运算。本研究提出一种采用GlobalFoundries 22 nm FDSOI工艺实现的3D集成、存上存SRAM-eDRAM混合CIM架构,能够以4比特精度直接在存储交叉阵列内执行通用矩阵运算。通过利用专用转置架构、存内算术运算、外围感知设计以及3D SRAM-eDRAM集成技术,所提架构在通用矩阵运算中实现了延迟、能效与计算密度的平衡,同时保持与传统CIM点积架构的兼容性。总体而言,这种存上存CIM框架将CIM的应用范围从点积运算拓展至通用矩阵处理,为人工智能加速与通用高性能计算领域开辟了更广阔的应用前景。