GenAI services are in an early yet fast expanding phase. Providers compete on model capability and service quality, while the underlying infrastructure remains expensive and heterogeneous across regions, workloads, and compute assets. If these services diffuse into routine daily use, the relevant engineering problem becomes not only better models but also efficient dispatch on a geographically distributed AI service infrastructure. To address this, we formulate a network-constrained token-flow market that clears AI workloads across compute nodes and communication links. The baseline model is a linear program that co-optimizes routing and processing subject to compute-capacity and bandwidth constraints; its dual variables define location- and workload-specific marginal service prices. We further introduce a transfer-aware extension that prices data movement in physical units and isolates bandwidth congestion rents. In a 5-node U.S. case study, the transfer-aware model uncovers four saturated backbone links and raises total operating cost by 2.7\% relative to the token-equivalent baseline, while tightening the chatbot latency limit from 100~ms to 15~ms increases one locational price by 117\%. A 20-node scale-up exhibits the same merit-order dispatch logic and becomes infeasible once demand exceeds aggregate capacity. These results suggest that locational pricing is a useful organizing principle for operating an emerging AI service infrastructure and, over time, for designing competitive markets around it.
翻译:生成式人工智能(GenAI)服务正处于早期但快速扩张阶段。服务提供商在模型能力与服务质量上展开竞争,而其底层基础设施成本高昂且在不同区域、工作负载及计算资产间呈现异构性。若此类服务渗透至日常应用,相关工程问题不仅涉及更优的模型,更在于如何在分布式地理布局的AI服务基础设施上实现高效调度。为此,我们构建了一个受网络约束的Token流市场模型,用于在计算节点与通信链路上清算AI工作负载。基础模型为线性规划,其在计算容量与带宽约束下协同优化路由与处理过程;其对偶变量定义了区位与工作负载特定的边际服务价格。我们进一步引入考虑传输成本的扩展模型,该模型以物理单位对数据移动进行定价,并剥离带宽拥塞租金。在5节点美国案例研究中,考虑传输成本的模型识别出四条饱和骨干链路,其总运营成本相比Token等价基准模型上升2.7%,而将聊天机器人延迟限制从100毫秒收紧至15毫秒,导致某一区位价格飙升117%。20节点规模扩展实验展现出相同的优先顺序调度逻辑,且一旦需求超过总容量,系统将变得不可行。这些结果表明,区位定价可作为运营新兴AI服务基础设施的有效组织原则,并随着时间推移,为围绕该基础设施设计竞争性市场提供理论支撑。