This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a lexicographic multi-objective optimization framework that addresses this challenge without requiring manual weight tuning. The proposed model incorporates real-world constraints, including token-dependent processing delay and energy consumption, heterogeneous hardware capabilities, dynamic renewable generation, and spatiotemporal variations in electricity prices and carbon intensity. Unlike existing approaches that optimize individual environmental metrics in isolation, Green-LLM jointly minimizes operational cost, carbon emissions, and delay penalty while enforcing water consumption constraints to ensure both sustainability and quality-of-service requirements. Numerical results demonstrate that Green-LLM achieves significant reductions in carbon emissions and water consumption while maintaining operational costs within 3% of the minimum and ensuring sub-2-second response latency. These findings show that sustainable LLM inference can be achieved without sacrificing service quality or economic efficiency.
翻译:本文研究了在异构边缘数据中心间随时间分配大语言模型推理工作负载的最优策略。每个数据中心配备现场可再生能源发电设施,并面临动态电价及可再生能源供应在时空维度上的波动性。我们提出Green-LLM——一种无需手动调整权重的字典序多目标优化框架。所提模型纳入了真实世界约束条件,包括与令牌数相关的处理时延及能耗、异构硬件能力、动态可再生能源发电量,以及电价与碳强度的时空变化。与现有孤立优化单项环境指标的方案不同,Green-LLM在保证可持续性与服务质量需求的同时,联合优化运营成本、碳排放量及时延惩罚,并强制执行水资源消耗约束。数值结果表明,Green-LLM在显著降低碳排放和水资源消耗的同时,可将运营成本维持在最低值的3%以内,并确保响应延迟低于2秒。这些发现证明,可持续的LLM推理能够在不牺牲服务质量或经济效益的前提下实现。