The ever increasing demand for ML-driven intelligence in a wide spectrum of domains has led to ubiquity of GPUs. At the same time, GPUs are notorious for their power consumption needs and often dominate power allocation in a typical ML datacenter. While datacenter-level power optimizations which focus on collection of GPUs are promising, in this work, we take a different tack -- namely, we take a closer look at power consumption inside a GPU. Specifically, as modern GPUs are comprised of integrated components, we make a case for component-awareness, termed CompPow in this work, for improved power management in modern GPUs. We demonstrate for a variety of ML operations and execution patterns, CompPow has the potential to deliver higher energy efficiency (10%) and even improved performance (5%). We conclude with recommendations on how component-aware software-hardware co-design can extract additional energy efficiency from modern GPUs.
翻译:机器学习驱动的智能应用在众多领域的持续增长,使得GPU变得无处不在。与此同时,GPU以其高功耗需求著称,在典型的机器学习数据中心中往往占据主导性电源分配。尽管聚焦于GPU集群的数据中心级电源优化方案颇具前景,但本文另辟蹊径——即深入探究GPU内部的功耗构成。具体而言,鉴于现代GPU由集成组件构成,我们提出组件感知方案(本文中称为CompPow),以改进现代GPU的电源管理。针对多种机器学习操作与执行模式的实验表明,CompPow能够实现更高的能效(提升10%)乃至性能提升(5%)。最后,我们提出如何通过组件感知的软硬件协同设计,从现代GPU中挖掘额外能效的相关建议。