Datacenter operators and electrical utilities rely on power traces at different spatiotemporal scales. Operators use fine-grained traces for provisioning, facility management, and scheduling, while utilities use site-level load profiles for capacity and interconnection planning. Existing datacenter power models do not capture LLM inference workloads, in which GPUs shift rapidly among compute-intensive prefill, lower-power decode, and idle states, and facility demand depends on how these states evolve and synchronize across many devices. We show that LLM inference power can be represented compositionally through two components: workload-driven transitions among operating states and configuration-specific power distributions within those states. Building on this observation, we develop a trace-generation framework that learns from measured traces and synthesizes power profiles for new traffic conditions and serving configurations. These traces aggregate from GPU servers to rack-, row-, and facility-scale load profiles at the temporal granularity required by the study. Across multiple LLMs, tensor-parallel settings, and GPU generations, our framework achieves median absolute energy error below 5% for most configurations while preserving temporal autocorrelation structure. The resulting traces support downstream analyses including oversubscription, power modulation, and utility-facing load characterization, enabling infrastructure evaluations that flat nameplate assumptions and static trace replay cannot support.
翻译:数据中心运营商和电力公司依赖不同时空尺度的功耗迹。运营商利用细粒度迹进行资源调配、设施管理和调度,而电力公司则使用站点级负载曲线进行容量与互联规划。现有数据中心功耗模型未能捕捉LLM推理工作负载的特征——此场景下GPU会在计算密集型的预填充阶段、低功耗解码阶段与空闲状态间快速切换,且设施整体需求取决于这些状态在多设备间的演进与同步方式。我们证明,LLM推理功耗可通过两个组件进行组合式表征:操作状态间由工作负载驱动的状态转移,以及这些状态内特定于配置的功耗分布。基于此发现,我们开发了一个轨迹生成框架,该框架从实测迹中学习,并为新的流量条件和服务配置合成功耗曲线。这些迹从GPU服务器层级聚合到机架、排架及设施级负载曲线,具有研究所要求的时间分辨率。跨多个LLM、张量并行设置及GPU代际的测试表明,我们的框架能在保持时间自相关结构的同时,为大多数配置实现中位数绝对能量误差低于5%的性能。生成的迹可支持超额订阅、功率调制及面向电力公司的负载表征等下游分析,从而实现传统铭牌假设与静态迹回放无法支持的基础设施评估。