Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.
翻译:现代NVIDIA GPU(如搭载HBM2e的H100与搭载HBM3e的H200)在计算特性上相似,但内存接口技术与带宽存在显著差异。通过将内存带宽作为关键变量进行隔离分析,两种架构间内存与流式多处理器(SM)之间的功率分配发生显著变化。在能效计算时代,研究这些硬件特性如何影响每瓦性能至关重要。本研究探讨了H100和H200在不同功率限制水平下管理内存功耗的方式。通过回归分析,我们研究了内存功率限制机制,并发现了消耗异常内存功率的离群值。为评估效能,我们采用了计算密集型(DGEMM)和内存密集型(TheBandwidthBenchmark)工作负载,分别代表屋顶线模型的两个极端。观察结果表明,在不同的功率限制范围内,H100仍是严格计算密集型工作负载的稍优选择,而H200在内存密集型应用中展现出更优的能效表现。