In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the Cache-aware Roofline Model (CARM) offer effective guidance by providing insights into bottlenecks that limit the application's ability to reach the system's maximum performance. To fully exploit the benefits of CARM optimization guidance for application development, automatic tools for cross-architecture model construction and in-depth application characterization are absolutely essential. Given a plethora of existing CPU architectures, the current landscape of CARM-enabled tools covers either vendor-specific (Intel Advisor), not sufficiently developed (ARM) or simply non-existing (AMD, RISC-V) tools. This is a particular gap that this work intends to close by bringing automatic CARM support to all major CPU architectures and ISAs, i.e., x86 (Intel, AMD), ARM, and RISC-V, by developing assembly microbenchmarks specifically tailored to cover a full performance spectrum of modern CPUs (from scalar to all supported vector ISA extensions) for both computational units and all memory hierarchy levels. Additionally, this work integrates application analysis within the CARM framework using performance counters and dynamic binary instrumentation. Experimental results show that the CARM roofs constructed with the proposed automated framework provide less than a 1% deviation across various tested architectural maximums.
翻译:近年来,作为核心组件的HPC系统与CPU架构日益复杂,使得应用开发与优化面临巨大挑战。在此背景下,缓存感知型屋顶线模型(CARM)等直观性能模型通过揭示限制应用达到系统最大性能的瓶颈,提供了有效指导。为充分发挥CARM优化指导对应用开发的效益,跨架构模型构建与深度应用表征的自动化工具不可或缺。面对现有众多CPU架构,当前支持CARM的工具生态中,要么存在厂商专属工具(Intel Advisor),要么开发不足(ARM),要么完全缺失(AMD, RISC-V)。本研究旨在填补这一空白,通过为所有主流CPU架构与指令集架构(即x86(Intel、AMD)、ARM和RISC-V)开发专用汇编微基准测试,全面覆盖现代CPU的计算单元与所有内存层级(从标量到所有支持的向量ISA扩展)的性能谱系,实现自动化CARM支持。此外,本研究利用性能计数器与动态二进制插桩技术,将应用分析集成至CARM框架。实验结果表明,基于所提自动化框架构建的CARM屋顶线,在不同架构最大值测试中的偏差低于1%。