This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time. Each DC features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. The central question is: how can inference workloads be optimally distributed to the DCs to minimize energy consumption, carbon emissions, and water usage while enhancing user experience? This letter proposes a novel optimization model for LLM service providers to reduce operational costs and environmental impacts. Numerical results validate the efficacy of the proposed approach.
翻译:本文研究了大语言模型(LLM)推理工作负载在异构边缘数据中心(DCs)间随时间的最优分配问题。每个数据中心配备本地可再生能源发电设施,并面临动态电价以及可再生能源可用性的时空变化。核心问题在于:如何将推理工作负载最优地分配至各数据中心,以在提升用户体验的同时,最小化能耗、碳排放及水资源消耗?本文为LLM服务提供商提出了一种新颖的优化模型,旨在降低运营成本与环境影响。数值结果验证了所提方法的有效性。