AI inference is becoming a persistent and geographically distributed source of electricity demand. Unlike many traditional electrical loads, inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. This paper studies when such digital relocation of computation can be interpreted as latency-constrained relocation of electricity demand. We develop an energy-geography framework for geo-distributed AI inference. The framework models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. The key object is the energy-latency frontier: the marginal cost and carbon benefit unlocked by relaxing inference latency budgets. The paper makes four contributions. First, it distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Fourth, it provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. The results show that latency relaxation expands feasible geography, while migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits.
翻译:AI推理正成为一种持续性的、地理分布式的电力需求源。与许多传统电力负荷不同,推理工作负载有时可在远离用户服务位置的地方执行,前提是延迟、状态局部性、容量和监管约束仍可接受。本文研究了这种计算数字迁移何时可被解释为受延迟约束的电力需求迁移。我们为地理分布式AI推理构建了一个能源-地理框架。该框架建模了包含客户端、服务节点和计算节点的三层架构,并将推理部署表述为一个受电价、边际碳强度、电力使用效率、计算容量、网络延迟和迁移摩擦约束的优化问题。核心对象是能源-延迟边界:即通过放宽推理延迟预算所释放的边际成本与碳效益。本文做出四项贡献:第一,区分了物理电力传输与消耗电力的计算数字迁移;第二,提出了包含可行性掩码和迁移摩擦的地理分布式推理部署模型;第三,引入了可操作指标,包括可迁移推理需求、延迟的能源回报率、延迟的碳回报率以及迁移盈亏平衡条件;第四,提供了对代表性全球计算区域的透明简化仿真,以展示异构延迟容忍度如何将工作负载分离为本地、区域和面向能源的执行层。结果表明,延迟放宽扩大了可行地理范围,而迁移摩擦、出口成本、状态局部性、法律限制和容量上限可能急剧降低实际效益。