AI power demand is growing at an unprecedented rate while power grids are often ailing and struggle to keep up. Grid expansion comes with high capital expenditure and long-distance transmission losses, yet there is abundant renewable energy at the source, just not matched to demand. This paper proposes a complementary AI infrastructure deployment model, AI Greenferencing, that brings modular AI compute to renewable energy sources, focusing on wind, allowing AI footprint expansion, generating local behind-the-meter demand for renewable sites, and helping ease the growing strain on power utilities. Our feasibility analysis shows that 890+ GW of wind capacity lies within 50 ms network round trip time of Azure data centers, and that site-wise right-sizing combined with spatial complementarity of wind energy keeps aggregate fleet utilization on par with traditional deployments. To serve inference requests under variable wind power, we build XWind, a lightweight, reactive, and workload-agnostic AI inference router that uses only real-time signals: inference latency, KV-cache utilization, and queue depth, to dynamically configure sites and distribute requests. Evaluated on a real 64-GPU A100 testbed emulating three wind-powered sites with Azure production traces, XWind reduces P99 end-to-end latency by up to 52% over the strongest contender (also our idea) and by up to 98% over baselines such as power-capping and GPU idling, with consistent gains across workload types, load levels, and GPU generations.
翻译:人工智能的电力需求正以空前的速度增长,而电网往往不堪重负,难以跟上步伐。电网扩容伴随着高额资本支出和远距离传输损耗,然而在源头却有充足的可再生能源,只是与需求不匹配。本文提出了一种互补性AI基础设施部署模式——AI绿电协同(AI Greenferencing),将模块化AI计算部署至可再生能源源头,重点关注风能,实现AI算力足迹扩展,为可再生能源场站创造本地表计侧需求,并有助于缓解电力公用事业日益增长的负担。我们的可行性分析表明,在Azure数据中心的50毫秒网络往返时间内,可触及超过890吉瓦的风电容量;同时,站点级容量优化结合风能的时空互补性,可使机群总体利用率与传统部署模式相当。为了在波动的风电功率下处理推理请求,我们构建了XWind——一种轻量级、响应式且与负载无关的AI推理路由器,仅使用实时信号(推理延迟、KV缓存利用率和队列深度)来动态配置站点和分发请求。在模拟三个风电驱动站点、采用Azure生产日志的真实64 GPU A100测试平台上评估,XWind将P99端到端延迟较最强竞争者(也为我们提出的构想)最多降低52%,较功率上限与GPU空闲占用等基线方法最多降低98%,且在不同工作负载类型、负载水平和GPU代际上均有一致性的性能提升。