In today's data centers, personalized recommendation systems face challenges such as the need for large memory capacity and high bandwidth, especially when performing embedding operations. Previous approaches have relied on DIMM-based near-memory processing techniques or introduced 3D-stacked DRAM to address memory-bound issues and expand memory bandwidth. However, these solutions fall short when dealing with the expanding size of personalized recommendation systems. Recommendation models have grown to sizes exceeding tens of terabytes, making them challenging to run efficiently on traditional single-node inference servers. Although various algorithmic methods have been proposed to reduce embedding table capacity, they often result in increased memory access or inefficient utilization of memory resources. This paper introduces HEAM, a heterogeneous memory architecture that integrates 3D-stacked DRAM with DIMM to accelerate recommendation systems in which compositional embedding is utilized-a technique aimed at reducing the size of embedding tables. The architecture is organized into a three-tier memory hierarchy consisting of conventional DIMM, 3D-stacked DRAM with a base die-level Processing-In-Memory (PIM), and a bank group-level PIM incorporating lookup tables. This setup is specifically designed to accommodate the unique aspects of compositional embedding, such as temporal locality and embedding table capacity. This design effectively reduces bank access, improves access efficiency, and enhances overall throughput, resulting in a 6.3 times speedup and 58.9% energy savings compared to the baseline.
翻译:在当今数据中心中,个性化推荐系统面临大容量内存需求与高带宽要求的挑战,尤其是在执行嵌入操作时。以往方法依赖于基于DIMM的近内存处理技术或引入3D堆叠DRAM来解决内存瓶颈问题并扩展内存带宽。然而,这些方案在应对日益增长的个性化推荐系统规模时存在不足。推荐模型已扩展至超过数十太字节,使其难以在传统单节点推理服务器上高效运行。尽管已有多种算法方法被提出以减少嵌入表容量,但它们往往导致内存访问增加或内存资源利用率低下。本文提出HEAM,一种异构内存架构,将3D堆叠DRAM与DIMM集成以加速采用组合嵌入技术的推荐系统——该技术旨在缩小嵌入表规模。该架构组织为三层内存层次结构,包含传统DIMM、配备基底级存内计算(PIM)的3D堆叠DRAM以及集成查找表的存储体组级PIM。这一设计特别针对组合嵌入的独特特性(如时间局部性与嵌入表容量)进行优化,有效减少存储体访问次数、提升访问效率并增强整体吞吐量,相较于基线方案实现了6.3倍加速和58.9%的能耗节省。