Cloud platforms are increasing their emphasis on sustainability and reducing their operational carbon footprint. A common approach for reducing carbon emissions is to exploit the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion times, we present a new approach that exploits the elasticity of batch workloads in the cloud to optimize their carbon emissions. Our approach is based on the notion of "carbon scaling," similar to cloud autoscaling, where a job dynamically varies its server allocation based on fluctuations in the carbon cost of the grid's energy. We develop a greedy algorithm for minimizing a job's carbon emissions via carbon scaling that is based on the well-known problem of marginal resource allocation. We implement a CarbonScaler prototype in Kubernetes using its autoscaling capabilities and an analytic tool to guide the carbon-efficient deployment of batch applications in the cloud. We then evaluate CarbonScaler using real-world machine learning training and MPI jobs on a commercial cloud platform and show that it can yield i) 51% carbon savings over carbon-agnostic execution; ii) 37% over a state-of-the-art suspend-resume policy; and iii) 8% over the best static scaling policy.
翻译:云平台正日益强调可持续性并减少其运营碳足迹。减少碳排放的常见方法是利用许多云工作负载固有的时间灵活性,在绿色能源最充足的时段执行任务,并在其他时段暂停。由于这种暂停-恢复方法可能导致作业完成时间出现较长延迟,我们提出了一种新方法,利用云中批处理工作负载的弹性来优化其碳排放。我们的方法基于“碳缩放”的概念,类似于云自动缩放,即作业根据电网能源碳成本的波动动态调整其服务器分配。我们基于边际资源分配这一经典问题,设计了一种通过碳缩放最小化作业碳排放的贪心算法。我们在Kubernetes中利用其自动缩放能力和分析工具实现了CarbonScaler原型,以指导批处理应用在云中的碳高效部署。随后,我们在商业云平台上使用真实世界的机器学习训练和MPI作业对CarbonScaler进行评估,结果表明它能够实现:i)相较于忽视碳足迹的执行方式节省51%的碳排放;ii)相较于最先进的暂停-恢复策略节省37%;以及iii)相较于最佳静态缩放策略节省8%。