In today's data centers, personalized recommendation systems face challenges such as the need for large memory capacity and high bandwidth, especially when performing embedding operations. Previous approaches have relied on DIMM-based near-memory processing techniques or introduced 3D-stacked DRAM to address memory-bound issues and expand memory bandwidth. However, these solutions fall short when dealing with the expanding size of personalized recommendation systems. Recommendation models have grown to sizes exceeding tens of terabytes, making them challenging to run efficiently on traditional single-node inference servers. Although various algorithmic methods have been proposed to reduce embedding table capacity, they often result in increased memory access or inefficient utilization of memory resources. This paper introduces HEAM, a heterogeneous memory architecture that integrates 3D-stacked DRAM with DIMM to accelerate recommendation systems in which compositional embedding is utilized-a technique aimed at reducing the size of embedding tables. The architecture is organized into a three-tier memory hierarchy consisting of conventional DIMM, 3D-stacked DRAM with a base die-level Processing-In-Memory (PIM), and a bank group-level PIM incorporating a Look-Up-Table. This setup is specifically designed to accommodate the unique aspects of compositional embedding, such as temporal locality and embedding table capacity. This design effectively reduces bank access, improves access efficiency, and enhances overall throughput, resulting in a 6.3 times speedup and 58.9% energy savings compared to the baseline.
翻译:在当今数据中心中,个性化推荐系统面临大内存容量和高带宽需求带来的挑战,尤其是在执行嵌入操作时。以往方法依赖于基于DIMM的近内存处理技术或引入3D堆叠DRAM来解决内存瓶颈问题并扩展内存带宽,但这些方案在处理日益庞大的个性化推荐系统时捉襟见肘。推荐模型规模已增长至数十TB级别,难以在传统单节点推理服务器上高效运行。尽管已有多种算法方法被提出以减少嵌入表容量,但它们往往导致内存访问次数增加或内存资源利用效率低下。本文提出HEAM,一种异构内存架构,通过将3D堆叠DRAM与DIMM集成,加速采用组合嵌入(一种旨在缩小嵌入表尺寸的技术)的推荐系统。该架构被组织为三级内存层次结构,包括传统DIMM、具备基底Die级存内计算(PIM)的3D堆叠DRAM,以及集成查找表的Bank组级PIM。该设计专为适配组合嵌入的独特特性(如时间局部性和嵌入表容量)而优化,有效减少Bank访问次数,提升访问效率,并显著提高整体吞吐量,相比基线实现了6.3倍加速和58.9%的能耗节省。