Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g., the UPMEM system) is now available and has demonstrated potential in many applications. However, programming such real PIM hardware remains a challenge for many programmers. This paper presents a new software framework, SimplePIM, to aid programming real PIM systems. The framework processes arrays of arbitrary elements on a PIM device by calling iterator functions from the host and provides primitives for communication among PIM cores and between PIM and the host system. We implement SimplePIM for the UPMEM PIM system and evaluate it on six major applications. Our results show that SimplePIM enables 66.5% to 83.1% reduction in lines of code in PIM programs. The resulting code leads to higher performance (between 10% and 37% speedup) than hand-optimized code in three applications and provides comparable performance in three others. SimplePIM is fully and freely available at https://github.com/CMU-SAFARI/SimplePIM.
翻译:数据在内存与处理器之间的移动是当前计算系统的主要瓶颈。内存计算(PIM)范式旨在通过在内存芯片内部执行计算来缓解这一瓶颈。真实的PIM硬件(如UPMEM系统)现已可用,并在诸多应用中展现出潜力。然而,对许多程序员而言,为这类真实PIM硬件编程仍是一大挑战。本文提出一种新的软件框架SimplePIM,以辅助真实PIM系统的编程。该框架通过从主机端调用迭代器函数,在PIM设备上处理任意元素的数组,并提供PIM核心间及PIM与主机系统间的通信原语。我们基于UPMEM PIM系统实现了SimplePIM,并在六大主要应用上进行了评估。结果表明,SimplePIM可使PIM程序的代码量减少66.5%至83.1%。在三个应用中,生成的代码性能高于手工优化代码(加速比10%至37%),而在其他三个应用中性能相当。SimplePIM已在https://github.com/CMU-SAFARI/SimplePIM上完全免费开源。