Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where bulk-bitwise operations are performed leveraging intrinsic analog properties within the DRAM array and massive parallelism across DRAM columns. Unfortunately, 1) modern off-the-shelf DRAM chips do not officially support PuM operations, and 2) existing techniques of performing PuM operations on off-the-shelf DRAM chips suffer from two key limitations. First, these techniques have low success rates, i.e., only a small fraction of DRAM columns can correctly execute PuM operations because they operate beyond manufacturer-recommended timing constraints, causing these operations to be highly susceptible to noise and process variation. Second, these techniques have limited compute primitives, preventing them from fully leveraging parallelism across DRAM columns and thus hindering their performance benefits. We propose PULSAR, a new technique to enable high-success-rate and high-performance PuM operations in off-the-shelf DRAM chips. PULSAR leverages our new observation that a carefully crafted sequence of DRAM commands simultaneously activates up to 32 DRAM rows. PULSAR overcomes the limitations of existing techniques by 1) replicating the input data to improve the success rate and 2) enabling new bulk bitwise operations (e.g., many-input majority, Multi-RowInit, and Bulk-Write) to improve the performance. Our analysis on 120 off-the-shelf DDR4 chips from two major manufacturers shows that PULSAR achieves a 24.18% higher success rate and 121% higher performance over seven arithmetic-logic operations compared to FracDRAM, a state-of-the-art off-the-shelf DRAM-based PuM technique.
翻译:处理器与主存之间的数据移动是现代系统提升性能和能效的首要障碍。为应对这一挑战,处理中存储(PuM)是一种有前景的方法,它利用DRAM阵列内在的模拟特性与跨列的大规模并行性来执行批量位操作。然而,1)现代商用DRAM芯片并未官方支持PuM操作,2)现有在商用DRAM芯片上执行PuM操作的技术存在两大关键局限。首先,这些技术的成功率较低(即仅极小部分DRAM列能正确执行PuM操作),因为它们操作时超出了制造商推荐的时序约束,导致操作极易受噪声和工艺偏差影响。其次,这些技术的计算原语有限,无法充分利用DRAM列间的并行性,从而制约了其性能优势。我们提出PULSAR,一种在商用DRAM芯片上实现高成功率和高效能PuM操作的新技术。PULSAR利用我们的新发现:精心设计的DRAM命令序列可同时激活多达32个DRAM行。该技术通过以下方式克服现有局限:1)复制输入数据以提高成功率,2)启用新型批量位操作(如多输入多数表决、多行初始化、批量写入)以提升性能。我们对两大制造商的120块商用DDR4芯片的分析表明,与当前最先进的商用DRAM基PuM技术FracDRAM相比,PULSAR在七种算术逻辑操作上的成功率提升24.18%,性能提升121%。