DaPPA：一种面向内存处理架构的数据并行编程框架 (DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures)

The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM modules. However, a programming UPMEM-based system remains challenging due to the need for explicit data management and workload partitioning across DPUs. We introduce DaPPA (data-parallel processing-in-memory architecture), a programming framework that eases the programmability of UPMEM systems by automatically managing data movement, memory allocation, and workload distribution. The key idea behind DaPPA is to leverage a high-level data-parallel pattern-based programming interface to abstract hardware complexities away from the programmer. DaPPA comprises three main components: (i) data-parallel pattern APIs, a collection of five primary data-parallel pattern primitives that allow the programmer to express data transformations within an application; (ii) a dataflow programming interface, which allows the programmer to define how data moves across data-parallel patterns; and (iii) a dynamic template-based compilation, which leverages code skeletons and dynamic code transformations to convert data-parallel patterns implemented via the dataflow programming interface into an optimized UPMEM binary. We evaluate DaPPA using six workloads from the PrIM benchmark suite on a real UPMEM system. Compared to hand-tuned implementations, DaPPA improves end-to-end performance by 2.1x, on average, and reduces programming complexity (measured in lines-of-code) by 94%. Our results demonstrate that DaPPA is an effective programming framework for efficient and user-friendly programming on UPMEM systems.

翻译：现代应用中数据量的不断增长导致传统以处理器为中心的系统产生了巨大的计算开销。内存处理架构通过将计算移至更靠近内存的位置来缓解这些开销，从而减少数据移动开销。UPMEM是首个商业化的内存处理系统，其特点是在DRAM模块中集成了数千个顺序处理器。然而，由于需要在DPU之间进行显式数据管理和工作负载划分，基于UPMEM的系统编程仍然具有挑战性。本文介绍了DaPPA，这是一种编程框架，通过自动管理数据移动、内存分配和工作负载分布，简化了UPMEM系统的可编程性。DaPPA的核心思想是利用高级数据并行模式编程接口，将硬件复杂性从程序员处抽象出来。DaPPA包含三个主要组件：数据并行模式API，这是一组五个主要的数据并行模式原语，允许程序员表达应用程序内的数据转换；数据流编程接口，允许程序员定义数据如何在数据并行模式之间移动；以及基于动态模板的编译，它利用代码骨架和动态代码转换，将通过数据流编程接口实现的数据并行模式转换为优化的UPMEM二进制文件。我们在真实的UPMEM系统上使用PrIM基准测试套件中的六个工作负载对DaPPA进行了评估。与手动调优的实现相比，DaPPA平均将端到端性能提高了2.1倍，并将编程复杂性降低了94%。我们的结果表明，DaPPA是一种有效的编程框架，可在UPMEM系统上实现高效且用户友好的编程。