The ability to dynamically allocate memory is fundamental in modern programming languages. However, this feature is not adequately supported in current general-purpose PIM devices. To identify key design principles that PIM must consider, we conduct a design space exploration of PIM memory allocators, examining various strategies for metadata placement and management of the allocator. Based on this exploration, we introduce PIM-malloc, a fast and scalable memory allocator for general-purpose PIM that operates on real PIM hardware, achieving a x66 improvement in memory allocation performance. This design is further enhanced with a lightweight, per-PIM core hardware cache, specifically designed for dynamic memory allocation, achieving an additional 31% performance improvement. Finally, we demonstrate the applicability of PIM-malloc by developing several representative PIM workloads, demonstrating its effectiveness in enhancing programmability.
翻译:动态内存分配能力是现代编程语言的基础特性。然而,当前通用存内计算设备对此功能的支持尚不充分。为明确存内计算架构必须考虑的关键设计原则,我们对存内计算内存分配器进行了设计空间探索,研究了元数据放置与分配器管理的多种策略。基于此探索,我们提出了PIM-malloc——一种在真实存内计算硬件上运行的快速可扩展通用内存分配器,其内存分配性能实现了66倍提升。该设计进一步配备了专为动态内存分配设计的轻量级单核硬件缓存,额外获得了31%的性能提升。最后,我们通过开发多个代表性存内计算工作负载,验证了PIM-malloc的适用性,证明了其在提升程序可编程性方面的有效性。