Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56x for cache-sensitive HPC applications, on a per-chip basis. Additionally, we exhaustively document our methodological exploration to motivate HPC centers to drive their own technological agenda through enhanced co-design.
翻译:在过去三十年中,内存子系统的创新主要致力于克服数据移动瓶颈。本文聚焦于内存技术中的特定市场趋势:3D堆叠内存与缓存。我们研究了在未来面向HPC的处理器中扩展片上存储能力(特别是通过3D堆叠SRAM)所产生的影响。首先,我们提出了一种与内存子系统无关的方法,用于评估消除数据移动成本时性能提升的上限。随后,利用gem5模拟器,我们建模了两种假想的LARge缓存处理器(LARC)变体,这些处理器采用1.5纳米工艺制造,并配备了高容量3D堆叠缓存。通过涉及广泛代理应用和基准测试的大量实验,我们旨在揭示HPC CPU性能的演变趋势,并得出结论:在每芯片基础上,对缓存敏感的HPC应用程序平均性能提升可达9.56倍。此外,我们详尽记录了方法学探索过程,以激励HPC中心通过增强的协同设计推动自身的技术议程。