Dynamic programming (DP) algorithms, such as All-Pairs Shortest Path (APSP) and genomic sequence alignment, are fundamental to many scientific domains but are severely bottlenecked by data movement on conventional architectures. While Processing-in-Memory (PIM) offers a promising solution, existing accelerators often address only a fraction of the work-flow, creating new system-level bottlenecks in host-accelerator communication and off-chip data streaming. In this work, we propose GenDRAM, a massively parallel PIM accelerator that overcomes these limitations. GenDRAM leverages the immense capacity and internal bandwidth of monolithic 3D DRAM(M3D DRAM) to integrate entire data-intensive pipelines, such as the full genomics workflow from seeding to alignment, onto a single heterogeneous chip. At its core is a novel architecture featuring specialized Search PUs for memory-intensive tasks and universal, multiplier-less Compute PUs for diverse DP calculations. This is enabled by a 3D-aware data mapping strategy that exploits the tiered latency of M3D DRAM for performance optimization. Through comprehensive simulation, we demonstrate that GenDRAM achieves a transformative performance leap, outperforming state-of-the-art GPU systems by over 68x on APSP and over 22x on the end-to-end genomics pipeline.
翻译:动态规划(DP)算法,如全对最短路径(APSP)和基因组序列比对,是许多科学领域的基础,但在传统架构上受到数据移动的严重瓶颈制约。虽然内存处理(PIM)提供了一种有前景的解决方案,但现有加速器通常仅针对工作流的一部分进行优化,从而在主机-加速器通信和片外数据流中引入了新的系统级瓶颈。在本工作中,我们提出了GenDRAM,一种大规模并行的PIM加速器,旨在克服这些限制。GenDRAM利用单片三维DRAM(M3D DRAM)的巨大容量和内部带宽,将整个数据密集型流水线(例如从种子生成到比对的完整基因组学工作流)集成到单个异构芯片上。其核心是一种新颖的架构,包含用于内存密集型任务的专用搜索处理单元(Search PUs),以及用于多样化DP计算的通用、无乘法器计算处理单元(Compute PUs)。这得益于一种三维感知的数据映射策略,该策略利用M3D DRAM的分层延迟特性进行性能优化。通过全面的仿真,我们证明GenDRAM实现了变革性的性能飞跃,在APSP上超越最先进的GPU系统超过68倍,在端到端基因组学流水线上超过22倍。