As AI workloads drive increases in datacenter power consumption, accurate GPU power estimation is critical for proactive power management. However, existing power models face a scalability bottleneck not in the modeling techniques themselves, but in obtaining the hardware utilization inputs they require. Conventional approaches rely on either costly simulation or hardware profiling, which makes them impractical when rapid predictions are required. This work presents EnergAIzer, which addresses this scalability bottleneck by developing a lightweight solution to predict utilization inputs, reducing the estimation walltime from hours to seconds. Our key insight is that kernels in AI workloads commonly employ optimizations that create structured patterns, which analytically determine memory traffic and execution timeline. We construct a performance model using these patterns as an analytical scaffold for empirical data fitting, which also naturally exposes module-level utilization. This predicted utilization is then fed into our power model to estimate dynamic power consumption. EnergAIzer achieves 8% power errors on NVIDIA Ampere GPUs, competitive with traditional power models with elaborate cycle-level simulation or hardware profiling. We demonstrate EnergAIzer's exploration capabilities for frequency scaling and architectural configurations, including forecasting the power of NVIDIA H100 with just 7% error. In summary, EnergAIzer provides fast and accurate power prediction for AI workloads, paving the way for power-aware design explorations.
翻译:随着AI工作负载推动数据中心功耗持续增长,精确的GPU功耗估算对于主动式功耗管理至关重要。然而,现有功耗模型的瓶颈并非来自建模技术本身,而是源于获取所需硬件利用率输入数据的困难。传统方法依赖昂贵的仿真或硬件分析,当需要快速预测时难以实用。本文提出EnergAIzer,通过开发轻量级解决方案预测利用率输入,将估算时间从数小时缩短至秒级,破解了这一可扩展性瓶颈。我们的核心发现是:AI工作负载中的内核通常采用优化手段形成结构化模式,这些模式可解析性地确定内存流量和执行时间线。我们利用这些模式构建性能模型作为经验数据拟合的分析框架,该模型还能自然暴露模块级利用率。预测的利用率随后输入功耗模型,用于估算动态功耗。在NVIDIA Ampere GPU上,EnergAIzer实现8%的功耗误差,与传统依赖精细周期级仿真或硬件分析的功耗模型性能相当。我们展示了EnergAIzer在频率缩放和架构配置探索中的能力,包括对NVIDIA H100功耗的预测误差仅7%。总之,EnergAIzer为AI工作负载提供快速准确的功耗预测,为功耗感知设计探索铺平道路。