Cloud platforms are increasingly emphasizing sustainable operations in order to reduce their operational carbon footprint. One approach for reducing emissions is to exploit the temporal flexibility inherent in many cloud workloads by executing them in time periods with the greenest electricity supply and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion times, we present a new approach that exploits the workload elasticity of batch workloads in the cloud to optimize their carbon emissions. Our approach is based on the notion of carbon scaling, similar to cloud autoscaling, where a job's server allocations are varied dynamically based on fluctuations in the carbon cost of the grid's electricity supply. We present an optimal greedy algorithm for minimizing a job's emissions through carbon scaling and implement a prototype of our \systemName system in Kubernetes using its autoscaling capabilities, along with an analytic tool to guide the carbon-efficient deployment of batch applications in the cloud. We evaluate CarbonScaler using real-world machine learning training and MPI jobs on a commercial cloud platform and show that \systemName can yield up to 50\% carbon savings over a carbon agnostic execution and up to 35% over the state-of-the-art suspend resume policies.
翻译:云平台日益强调可持续运营,以减少其运营碳足迹。一种减排方法是利用许多云工作负载固有的时间灵活性,在电力供应最清洁的时间段执行它们,并在其他时间暂停。由于此类暂停-恢复方法可能导致作业完成时间出现较长延迟,我们提出一种新方法,利用云中批处理工作负载的工作负载弹性来优化其碳排放。我们的方法基于碳缩放(carbon scaling)的概念,类似于云自动缩放(autoscaling),其中作业的服务器分配根据电网电力供应的碳成本波动动态变化。我们提出一种最优贪心算法,通过碳缩放最小化作业的碳排放,并在Kubernetes中利用其自动缩放能力实现了我们系统(our \systemName)的原型,同时开发了一个分析工具来指导云中批处理应用的碳高效部署。我们在商业云平台上使用真实世界的机器学习训练和MPI作业评估了CarbonScaler,结果显示,与碳无关的执行相比,我们的系统(our \systemName)可节省高达50%的碳,与最先进的暂停-恢复策略相比,可节省高达35%的碳。