With the growing prevalence of heterogeneous computing, CPUs are increasingly being paired with accelerators to achieve new levels of performance and energy efficiency. However, data movement between devices remains a significant bottleneck, complicating application development. Existing performance tools require considerable programmer intervention to diagnose and locate data transfer inefficiencies. To address this, we propose dynamic analysis techniques to detect and profile inefficient data transfer and allocation patterns in heterogeneous applications. We implemented these techniques into OMPDataPerf, which provides detailed traces of problematic data mappings, source code attribution, and assessments of optimization potential in heterogeneous OpenMP applications. OMPDataPerf uses the OpenMP Tools Interface (OMPT) and incurs only a 5 % geometric-mean runtime overhead.
翻译:随着异构计算的日益普及,CPU与加速器的结合使用正日益普遍,以实现更高层次的性能和能效。然而,设备间的数据移动仍然是显著的性能瓶颈,使应用开发变得复杂。现有的性能工具需要程序员的大量干预来诊断和定位数据传输的低效问题。为解决此问题,我们提出了动态分析技术,用于检测和分析异构应用中的低效数据传输与分配模式。我们将这些技术实现为OMPDataPerf,该工具能够提供异构OpenMP应用中问题数据映射的详细追踪、源代码归因以及优化潜力评估。OMPDataPerf利用OpenMP工具接口(OMPT),仅产生5%的几何平均运行时开销。