Processing-in-Memory (PIM) architectures enable computation directly within DRAM and help combat the memory wall problem. Bit-shifting is a fundamental operation that enables PIM applications such as shift-and-add multiplication, adders using carry propagation, and Galois field arithmetic used in cryptography algorithms like AES and Reed-Solomon error correction codes. Existing approaches to in-DRAM shifting require adding dedicated shifter circuits beneath the sense amplifiers to enable horizontal data movement across adjacent bitlines or vertical data layouts which store operand bits along a bitline to implement shifts as row-copy operations. In this paper, we propose a novel DRAM subarray design that enables in-DRAM bit-shifting for open-bitline architectures. In this new design, we built upon prior work that introduced a new type of cell used for row migration in asymmetric subarrays, called a "migration cell". We repurpose and extend the functionality by adding a row of migration cells at the top and bottom of each subarray which enables bidirectional bit-shifting within any given row. This new design maintains compatibility with standard DRAM operations. Unlike previous approaches to shifting, our design operates on horizontally-stored data, eliminating the need and overhead of data transposition, and our design leverages the existing cell structures, eliminating the need for additional complex logic and circuitry. We present an evaluation of our design that includes timing and energy analysis using NVMain, circuit-level validation of the in-DRAM shift operation using LTSPICE, and a VLSI layout implementation in Cadence Virtuoso.
翻译:内存内处理架构支持直接在DRAM内执行计算,有助于缓解内存墙问题。位移操作是一项基础运算,为多种内存内处理应用提供支持,例如移位相加乘法、基于进位传播的加法器,以及应用于AES加密算法和里德-所罗门纠错码等密码学算法的伽罗瓦域运算。现有的内DRAM移位方案需在读出放大器下方增设专用移位电路,以实现相邻位线间的横向数据移动;或采用纵向数据布局——将操作数比特沿位线存储,通过行复制操作实现移位。本文提出一种创新的DRAM子阵列设计,可为开放式位线架构实现内DRAM位移。该设计基于前期研究中提出的非对称子阵列行迁移专用单元——"迁移单元",通过在每个子阵列顶部与底部增设迁移单元行,使其功能得以重构与扩展,从而实现在任意行内进行双向位移。新设计保持与标准DRAM操作的兼容性,相较于既有移位方案具有两大优势:其一,直接对横向存储数据进行操作,无需数据转置及其带来的开销;其二,充分利用现有单元结构,无需额外复杂逻辑电路。我们通过NVMain进行时序与能耗评估,利用LTSPICE完成内DRAM移位操作的电路级验证,并在Cadence Virtuoso中实现VLSI版图设计,从而对设计方案进行全面评估。