With heterogeneous systems, the number of GPUs per chip increases to provide computational capabilities for solving science at a nanoscopic scale. However, low utilization for single GPUs defies the need to invest more money for expensive ccelerators. While related work develops optimizations for improving application performance, none studies how these optimizations impact hardware resource usage or the average GPU utilization. This paper takes a data-driven analysis approach in addressing this gap by (1) characterizing how hardware resource usage affects device utilization, execution time, or both, (2) presenting a multi-objective metric to identify important application-device interactions that can be optimized to improve device utilization and application performance jointly, (3) studying hardware resource usage behaviors of several optimizations for a benchmark application, and finally (4) identifying optimization opportunities for several scientific proxy applications based on their hardware resource usage behaviors. Furthermore, we demonstrate the applicability of our methodology by applying the identified optimizations to a proxy application, which improves the execution time, device utilization and power consumption by up to 29.6%, 5.3% and 26.5% respectively.
翻译:随着异构系统的发展,每芯片集成的GPU数量不断增加,旨在为纳米尺度科学研究提供计算能力。然而,单个GPU的低利用率与投资昂贵加速器的需求相悖。现有研究虽致力于开发提升应用性能的优化方案,却鲜有探讨这些优化如何影响硬件资源使用或GPU平均利用率。本文采用数据驱动分析方法填补这一空白,具体通过:(1) 表征硬件资源使用如何影响设备利用率、执行时间或二者兼有;(2)提出多目标度量指标,以识别可同时优化设备利用率与应用性能的关键应用-设备交互特征;(3)研究基准应用在多种优化方案下的硬件资源使用行为;(4)基于硬件资源使用模式识别若干科学代理应用的优化机会。此外,我们通过将识别出的优化方案应用于某代理应用,验证了方法的适用性,使执行时间、设备利用率和功耗分别最高提升29.6%、5.3%和降低26.5%。