Processing-in-memory (PIM) has been explored for decades by computer architects, yet it has never seen the light of day in real-world products due to their high design overheads and lack of a killer application. With the advent of critical memory-intensive workloads, several commercial PIM technologies have been introduced to the market ranging from domain-specific PIM architectures to more general-purpose PIM architectures. In this work, we deepdive into UPMEM's commercial PIM technology, a general-purpose PIM-enabled parallel architecture that is highly programmable. Our first key contribution is the development of a flexible simulation framework for PIM. The simulator we developed (aka PIMulator) enables the compilation of UPMEM-PIM source codes into its compiled machine-level instructions, which are subsequently consumed by our cycle-level performance simulator. Using PIMulator, we demystify UPMEM's PIM design through a detailed characterization study. Building on top of our characterization, we conduct a series of case studies to pathfind important architectural features that we deem will be critical for future PIM architectures to support
翻译:处理中内存(PIM)被计算机架构师探索数十年,但由于高昂的设计开销和缺乏杀手级应用,始终未能在实际产品中落地。随着关键内存密集型工作负载的出现,从领域专用PIM架构到更通用的PIM架构,多种商用PIM技术已进入市场。本研究深入剖析UPMEM的商用PIM技术——一种支持高度可编程的通用型并行PIM架构。我们的首要贡献是开发了一个灵活的PIM仿真框架。所开发的仿真器(即PIMulator)能将UPMEM-PIM源代码编译为机器级指令,随后由我们的周期级性能仿真器执行。借助PIMulator,我们通过详细特征化研究解析了UPMEM的PIM设计。基于特征化分析,我们开展一系列案例研究,探明我们认为对未来PIM架构支持至关重要的架构特性。