Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to make the most of hardware resources. We provide an in-depth overview of performance bottlenecks in recent OoO microarchitectures and describe the difficulties of detecting them. Techniques that measure resources utilization can offer a good understanding of a program's execution, but, due to the constraints inherent to Performance Monitoring Units (PMU) of CPUs, do not provide the relevant metrics for each use case. Another approach is to rely on a performance model to simulate the CPU behavior. Such a model makes it possible to implement any new microarchitecture-related metric. Within this framework, we advocate for implementing modeled resources as parameters that can be varied at will to reveal performance bottlenecks. This allows a generalization of bottleneck analysis that we call sensitivity analysis. We present Gus, a novel performance analysis tool that combines the advantages of sensitivity analysis and dynamic binary instrumentation within a resource-centric CPU model. We evaluate the impact of sensitivity on bottleneck analysis over a set of high-performance computing kernels.
翻译:现代乱序执行(Out-of-Order, OoO)CPU是由多个以复杂方式交织的组件构成的复杂系统。精确定位性能瓶颈并理解程序性能问题的根本原因,是充分利用硬件资源的关键任务。我们深入综述了近期OoO微架构中的性能瓶颈,并阐述了检测这些瓶颈的难点。测量资源利用率的技术能够较好地理解程序的执行过程,但由于CPU性能监测单元(Performance Monitoring Unit, PMU)固有的限制,无法为每种用例提供相关指标。另一种方法是依赖性能模型来模拟CPU行为,这种模型能够实现任何与微架构相关的新指标。在此框架下,我们主张将建模资源实现为可任意变化的参数,以揭示性能瓶颈。这实现了瓶颈分析的泛化,我们称之为敏感性分析。我们提出了Gus,一种新颖的性能分析工具,它结合了敏感性分析和动态二进制指令插桩的优势,并基于资源中心型CPU模型。我们评估了敏感性对一组高性能计算内核瓶颈分析的影响。