AI inference is becoming a persistent and geographically distributed source of electricity demand. Unlike many traditional electrical loads, inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. This paper studies when such digital relocation of computation can be interpreted as latency-constrained relocation of electricity demand. We develop an energy-geography framework for geo-distributed AI inference. The framework models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. The key object is the energy-latency frontier: the marginal cost and carbon benefit unlocked by relaxing inference latency budgets. The paper makes four contributions. First, it distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Fourth, it provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. The results show that latency relaxation expands feasible geography, while migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits.
翻译:人工智能推理正在成为一种持续且地理分布式的电力需求来源。与许多传统电力负荷不同,推理工作负载有时可在用户服务位置之外执行,前提是延迟、状态局部性、容量及监管约束可被满足。本文研究了这种计算任务的数字迁移何时可被解读为延迟约束下的电力需求迁移。我们提出了一个面向地理分布式人工智能推理的能源-地理框架。该框架构建了由客户端、服务节点和计算节点组成的三层架构,并将推理部署建模为一个受电力价格、边际碳排放强度、电能使用效率、计算容量、网络延迟及迁移摩擦约束的优化问题。其核心概念是"能源-延迟边界":通过放宽推理延迟预算所解锁的边际成本与碳效益。本文贡献有四:第一,区分了物理电力传输与消耗电力的计算任务的数字迁移;第二,提出了包含可行性掩码与迁移摩擦的地理分布式推理部署模型;第三,引入了可迁移推理需求、延迟的能源回报率、延迟的碳回报率以及迁移盈亏平衡条件等运营指标;第四,通过在典型全球计算区域上开展透明化风格模拟,展示了异构延迟容忍度如何将工作负载划分为本地执行层、区域执行层与能源导向执行层。结果表明,延迟放宽扩展了可行地理范围,但迁移摩擦、出口成本、状态局部性、法律约束及容量限制会显著降低实际效益。