Deep learning has experienced significant growth in recent years, resulting in increased energy consumption and carbon emission from the use of GPUs for training deep neural networks (DNNs). Answering the call for sustainability, conventional solutions have attempted to move training jobs to locations or time frames with lower carbon intensity. However, moving jobs to other locations may not always be feasible due to large dataset sizes or data regulations. Moreover, postponing training can negatively impact application service quality because the DNNs backing the service are not updated in a timely fashion. In this work, we present a practical solution that reduces the carbon footprint of DNN training without migrating or postponing jobs. Specifically, our solution observes real-time carbon intensity shifts during training and controls the energy consumption of GPUs, thereby reducing carbon footprint while maintaining training performance. Furthermore, in order to proactively adapt to shifting carbon intensity, we propose a lightweight machine learning algorithm that predicts the carbon intensity of the upcoming time frame. Our solution, Chase, reduces the total carbon footprint of training ResNet-50 on ImageNet by 13.6% while only increasing training time by 2.5%.
翻译:近年来深度学习发展迅猛,使用GPU训练深度神经网络(DNN)导致能源消耗与碳排放显著增加。为响应可持续发展号召,现有解决方案尝试将训练任务迁移至低碳强度地区或时段。然而,受限于庞大数据集规模或数据监管政策,任务迁移未必可行。此外,由于支撑服务的DNN无法及时更新,推迟训练将严重影响应用服务质量。本文提出一种无需迁移或推迟任务的实用方案,旨在降低DNN训练的碳足迹。具体而言,该方案通过实时监测训练过程中的碳强度变化并调控GPU能耗,在维持训练性能的同时减少碳排放。为实现对碳强度变化的主动适应,我们还提出一种轻量级机器学习算法,用于预测未来时段的碳强度。我们提出的Chase方案在ImageNet数据集上训练ResNet-50时,总碳足迹降低13.6%,而训练时间仅增加2.5%。