Processing-in-memory (PIM) has been explored for decades by computer architects, yet it has never seen the light of day in real-world products due to their high design overheads and lack of a killer application. With the advent of critical memory-intensive workloads, several commercial PIM technologies have been introduced to the market ranging from domain-specific PIM architectures to more general-purpose PIM architectures. In this work, we deepdive into UPMEM's commercial PIM technology, a general-purpose PIM-enabled parallel architecture that is highly programmable. Our first key contribution is the development of a flexible simulation framework for PIM. The simulator we developed (aka PIMulator) enables the compilation of UPMEM-PIM source codes into its compiled machine-level instructions, which are subsequently consumed by our cycle-level performance simulator. Using PIMulator, we demystify UPMEM's PIM design through a detailed characterization study. Building on top of our characterization, we conduct a series of case studies to pathfind important architectural features that we deem will be critical for future PIM architectures to support
翻译:处理-内存(PIM)技术已被计算机架构师探索数十年,但由于其高昂的设计开销和缺乏杀手级应用,始终未能在实际产品中落地。随着关键型内存密集型工作负载的出现,多种商用PIM技术已进入市场,涵盖从领域专用PIM架构到更通用的PIM架构。本研究深入剖析了UPMEM的商用PIM技术——一种高度可编程的通用型PIM并行架构。我们的首要贡献是开发了灵活的PIM仿真框架。该仿真器(即PIMulator)支持将UPMEM-PIM源代码编译为机器级指令,进而由我们的周期级性能仿真器执行。借助PIMulator,我们通过详细的特征分析揭示了UPMEM的PIM设计。基于此特征分析,我们开展了一系列案例研究,以探索我们认为对未来PIM架构至关重要且亟需支持的架构特性。