High-end ARM processors are emerging in data centers and HPC systems, posing as a strong contender to x86 machines. Memory-centric profiling is an important approach for dissecting an application's bottlenecks on memory access and guiding optimizations. Many existing memory profiling tools leverage hardware performance counters and precise event sampling, such as Intel PEBS and AMD IBS, to achieve high accuracy and low overhead. In this work, we present a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE). We evaluate the tool using both HPC and Cloud workloads on the ARM Ampere processor. Our results provide the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.
翻译:高端ARM处理器正逐渐在数据中心和高性能计算系统中崭露头角,成为x86架构机器的有力竞争者。内存中心性能剖析是剖析应用程序内存访问瓶颈、指导优化的重要方法。现有许多内存剖析工具利用硬件性能计数器与精确事件采样技术(如Intel PEBS和AMD IBS)来实现高精度与低开销。本研究提出一种基于统计剖析扩展(SPE)的ARM处理器多级内存剖析工具。我们在ARM Ampere处理器上使用高性能计算与云工作负载对该工具进行评估。我们的研究结果首次从量化角度评估了ARM SPE在不同采样周期与辅助缓冲区大小下进行内存中心剖析的时间开销与采样精度。