Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement of key AI computations on GPUs. To this end, we observe that as GPUs get more powerful, the resulting sub-millisecond to millisecond executions make fine-grain power analysis challenging. In this work, we first carefully identify the challenges in obtaining fine-grain GPU power profiles. To address these challenges, we devise FinGraV methodology where we employ execution time binning, careful CPU-GPU time synchronization, and power profile differentiation to collect fine-grain GPU power profiles across prominent AI computations and across a spectrum of scenarios. Using the said FinGraV power profiles, we provide both, guidance on accurate power measurement and, in-depth view of power consumption on state-of-the-art AMD Instinct MI300X. For the former, we highlight a methodology for power differentiation across executions. For the latter, we make several observations pertaining to GPU sub-component power consumption and GPU power proportionality across different scenarios. We believe that FinGraV unlocks both an accurate and a deeper view of power consumption of GPUs and opens up avenues for power optimization of these ubiquitous accelerators.
翻译:人工智能的普及使得优化GPU功耗成为当务之急,因为基于GPU的大规模集群常被用于训练和服务AI模型。优化GPU功耗的重要第一步,是对GPU上关键AI计算进行高保真、细粒度的功耗测量。为此,我们观察到,随着GPU性能日益强大,其执行时间缩短至亚毫秒到毫秒级别,这使得细粒度功耗分析面临挑战。本研究首先系统识别了获取细粒度GPU功耗曲线所面临的难题。为应对这些挑战,我们设计了FinGraV方法,通过执行时间分箱、精密的CPU-GPU时间同步以及功耗曲线微分技术,在主流AI计算及多种场景下采集细粒度GPU功耗曲线。基于所得FinGraV功耗曲线,我们既为精确功耗测量提供方法指导,又深入揭示了AMD Instinct MI300X这一先进GPU的功耗特性。在前者方面,我们重点提出了跨执行过程的功耗微分方法;在后者方面,我们针对不同场景下GPU子组件功耗特性及功耗比例性提出了多项观测结论。我们相信FinGraV不仅为GPU功耗提供了精确且深入的观测视角,更为这些无处不在的加速器的功耗优化开辟了新路径。