EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads

As AI workloads drive increases in datacenter power consumption, accurate GPU power estimation is critical for proactive power management. However, existing power models face a scalability bottleneck not in the modeling techniques themselves, but in obtaining the hardware utilization inputs they require. Conventional approaches rely on either costly simulation or hardware profiling, which makes them impractical when rapid predictions are required. This work presents EnergAIzer, which addresses this scalability bottleneck by developing a lightweight solution to predict utilization inputs, reducing the estimation walltime from hours to seconds. Our key insight is that kernels in AI workloads commonly employ optimizations that create structured patterns, which analytically determine memory traffic and execution timeline. We construct a performance model using these patterns as an analytical scaffold for empirical data fitting, which also naturally exposes module-level utilization. This predicted utilization is then fed into our power model to estimate dynamic power consumption. EnergAIzer achieves 8% power errors on NVIDIA Ampere GPUs, competitive with traditional power models with elaborate cycle-level simulation or hardware profiling. We demonstrate EnergAIzer's exploration capabilities for frequency scaling and architectural configurations, including forecasting the power of NVIDIA H100 with just 7% error. In summary, EnergAIzer provides fast and accurate power prediction for AI workloads, paving the way for power-aware design explorations.

翻译：随着AI工作负载推动数据中心功耗持续增长，精确的GPU功耗估算对于主动式功耗管理至关重要。然而，现有功耗模型的瓶颈并非来自建模技术本身，而是源于获取所需硬件利用率输入数据的困难。传统方法依赖昂贵的仿真或硬件分析，当需要快速预测时难以实用。本文提出EnergAIzer，通过开发轻量级解决方案预测利用率输入，将估算时间从数小时缩短至秒级，破解了这一可扩展性瓶颈。我们的核心发现是：AI工作负载中的内核通常采用优化手段形成结构化模式，这些模式可解析性地确定内存流量和执行时间线。我们利用这些模式构建性能模型作为经验数据拟合的分析框架，该模型还能自然暴露模块级利用率。预测的利用率随后输入功耗模型，用于估算动态功耗。在NVIDIA Ampere GPU上，EnergAIzer实现8%的功耗误差，与传统依赖精细周期级仿真或硬件分析的功耗模型性能相当。我们展示了EnergAIzer在频率缩放和架构配置探索中的能力，包括对NVIDIA H100功耗的预测误差仅7%。总之，EnergAIzer为AI工作负载提供快速准确的功耗预测，为功耗感知设计探索铺平道路。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《面向边缘AI应用的高性能高能效架构探索》156页

专知会员服务

37+阅读 · 2025年4月12日

《评估人工智能和辅助自动化指挥与控制决策辅助工具以提高任务效率的分析框架》

专知会员服务

138+阅读 · 2023年7月10日

【阿姆斯特丹博士论文】GPU图算法性能分析与预测，227页pdf

专知会员服务

40+阅读 · 2023年4月10日

【ChatGPT系列报告】ChatGPT的“背后英雄”，100页报告看懂GPU

专知会员服务

122+阅读 · 2023年2月18日