Performance models are instrumental for optimizing performance-sensitive code. When modeling the use of functional units of out-of-order x86-64 CPUs, data availability varies by the manufacturer: Instruction-to-port mappings for Intel's processors are available, whereas information for AMD's designs are lacking. The reason for this disparity is that standard techniques to infer exact port mappings require hardware performance counters that AMD does not provide. In this work, we modify the port mapping inference algorithm of the widely used uops.info project to not rely on Intel's performance counters. The modifications are based on a formal port mapping model with a counter-example-guided algorithm powered by an SMT solver. We investigate in how far AMD's processors comply with this model and where unexpected performance characteristics prevent an accurate port mapping. Our results provide valuable insights for creators of CPU performance models as well as for software developers who want to achieve peak performance on recent AMD CPUs.
翻译:性能模型对于优化性能敏感型代码至关重要。在对乱序执行的x86-64 CPU的功能单元使用情况进行建模时,数据可用性因制造商而异:英特尔处理器的指令到端口映射公开可用,而AMD设计的相关信息则有所缺失。造成这种差异的原因在于,推断精确端口映射的标准技术需要AMD未提供的硬件性能计数器。本文修改了广泛使用的uops.info项目的端口映射推断算法,使其不再依赖英特尔的性能计数器。这些修改基于形式化端口映射模型,并采用SMT求解器驱动的反例引导算法。我们研究了AMD处理器在多大程度上符合该模型,以及哪些意外性能特征会阻碍精确端口映射。研究结果为CPU性能模型的创建者以及希望在最新AMD CPU上实现峰值性能的软件开发人员提供了宝贵见解。