The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds of measurements that are represented as a family of bandwidth--latency curves. The benchmark increases the coverage of all the previous tools and leads to new findings in the behavior of the actual and simulated memory systems. We deploy the Mess benchmark to characterize Intel, AMD, IBM, Fujitsu, Amazon and NVIDIA servers with DDR4, DDR5, HBM2 and HBM2E memory. The Mess memory simulator uses bandwidth--latency concept for the memory performance simulation. We integrate Mess with widely-used CPUs simulators enabling modeling of all high-end memory technologies. The Mess simulator is fast, easy to integrate and it closely matches the actual system performance. By design, it enables a quick adoption of new memory technologies in hardware simulators. Finally, the Mess application profiling positions the application in the bandwidth--latency space of the target memory system. This information can be correlated with other application runtime activities and the source code, leading to a better overall understanding of the application's behavior. The current Mess benchmark release covers all major CPU and GPU ISAs, x86, ARM, Power, RISC-V, and NVIDIA's PTX. We also release as open source the ZSim, gem5 and OpenPiton Metro-MPI integrated with the Mess simulator for DDR4, DDR5, Optane, HBM2, HBM2E and CXL memory expanders. The Mess application profiling is already integrated into a suite of production HPC performance analysis tools.
翻译:内存压力(Mess)框架为内存系统基准测试、仿真与应用性能剖析提供了统一视角。Mess基准测试提供了全面且详细的内存系统表征。它基于数百项测量结果,这些结果以一族带宽-延迟曲线表示。该基准测试扩展了所有先前工具的覆盖范围,并在实际与仿真内存系统的行为中带来了新的发现。我们部署Mess基准测试来表征配备DDR4、DDR5、HBM2和HBM2E内存的英特尔、AMD、IBM、富士通、亚马逊和英伟达服务器。Mess内存仿真器采用带宽-延迟概念进行内存性能仿真。我们将Mess与广泛使用的CPU仿真器集成,实现了对所有高端内存技术的建模。Mess仿真器速度快、易于集成,并能紧密匹配实际系统性能。通过设计,它使得硬件仿真器能够快速采用新型内存技术。最后,Mess应用性能剖析将应用程序定位在目标内存系统的带宽-延迟空间中。此信息可与其他应用程序运行时活动及源代码相关联,从而更全面地理解应用程序的行为。当前发布的Mess基准测试覆盖所有主要CPU和GPU指令集架构,包括x86、ARM、Power、RISC-V和英伟达的PTX。我们还开源了与Mess仿真器集成的ZSim、gem5和OpenPiton Metro-MPI,支持DDR4、DDR5、Optane、HBM2、HBM2E及CXL内存扩展器。Mess应用性能剖析已集成到一套生产级高性能计算性能分析工具中。