Serverless computing simplifies cloud deployment but introduces new challenges in managing service latency and carbon emissions. Reducing cold-start latency requires retaining warm function instances, while minimizing carbon emissions favors reclaiming idle resources. This balance is further complicated by time-varying grid carbon intensity and varying workload patterns, under which static keep-alive policies are inefficient. We present LACE-RL, a latency-aware and carbon-efficient management framework that formulates serverless pod retention as a sequential decision problem. LACE-RL uses deep reinforcement learning to dynamically tune keep-alive durations, jointly modeling cold-start probability, function-specific latency costs, and real-time carbon intensity. Using the Huawei Public Cloud Trace, we show that LACE-RL reduces cold starts by 51.69% and idle keep-alive carbon emissions by 77.08% compared to Huawei's static policy, while achieving better latency-carbon trade-offs than state-of-the-art heuristic and single-objective baselines, approaching Oracle performance.
翻译:无服务器计算简化了云部署,但在管理服务延迟和碳排放方面引入了新的挑战。减少冷启动延迟需要保留预热函数实例,而最小化碳排放则倾向于回收闲置资源。时变的电网碳强度与多变的工作负载模式进一步复杂化了这种平衡,静态保活策略在此场景下效率低下。本文提出LACE-RL——一个延迟感知且碳高效的管理框架,将无服务器容器保留问题建模为序贯决策问题。LACE-RL采用深度强化学习动态调整保活时长,联合建模冷启动概率、函数特定延迟成本与实时碳强度。基于华为公共云追踪数据的实验表明:相较于华为静态策略,LACE-RL减少51.69%的冷启动和77.08%的闲置保活碳排放,同时比现有启发式和单目标基线方法实现了更优的延迟-碳排放权衡,其性能接近Oracle基准。