GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.
翻译:GPU正成为数据中心电力消耗的主要来源,但与CPU不同,即使可见活动接近零时,GPU仍可能保持高功耗。我们将这种状态称为执行空闲态。通过从大型学术AI集群获取的每秒级遥测数据,我们表征了实际部署中执行空闲态作为一种周期性低活跃度却高功耗状态的特征。跨多样化工作负载及多代GPU,这种状态占用19.7%的执行时间和10.7%的能耗。这表明需同时降低执行空闲态的成本与暴露时长。为此,我们构建了两个原型方案:一种在执行空闲期间自动降低资源规模,另一种利用负载不均衡减少暴露时长,二者均以性能为代价。这些发现表明,未来高能效GPU系统应将执行空闲态作为首要运行状态。