Deep learning has experienced significant growth in recent years, resulting in increased energy consumption and carbon emission from the use of GPUs for training deep neural networks (DNNs). Answering the call for sustainability, conventional solutions have attempted to move training jobs to locations or time frames with lower carbon intensity. However, moving jobs to other locations may not always be feasible due to large dataset sizes or data regulations. Moreover, postponing training can negatively impact application service quality because the DNNs backing the service are not updated in a timely fashion. In this work, we present a practical solution that reduces the carbon footprint of DNN training without migrating or postponing jobs. Specifically, our solution observes real-time carbon intensity shifts during training and controls the energy consumption of GPUs, thereby reducing carbon footprint while maintaining training performance. Furthermore, in order to proactively adapt to shifting carbon intensity, we propose a lightweight machine learning algorithm that predicts the carbon intensity of the upcoming time frame. Our solution, Chase, reduces the total carbon footprint of training ResNet-50 on ImageNet by 13.6% while only increasing training time by 2.5%.
翻译:深度学习近年来经历了显著增长,导致使用GPU训练深度神经网络(DNN)的能耗和碳排放增加。为响应可持续发展号召,传统解决方案试图将训练任务迁移至碳强度较低的地点或时间段。然而,由于数据集规模庞大或数据监管限制,将任务迁移至其他地点往往不可行。此外,推迟训练会因支撑服务的DNN无法及时更新而负面影响应用服务质量。本研究提出一种不依赖任务迁移或延后的实用方案,旨在降低DNN训练的碳足迹。具体而言,我们的方案在训练过程中实时观测碳强度变化,并控制GPU能耗,从而在保持训练性能的同时减少碳足迹。进一步地,为主动适应碳强度的动态变化,我们提出一种轻量级机器学习算法,用于预测未来时间段的碳强度。本方案Chase在训练ImageNet上的ResNet-50时,仅增加2.5%的训练时间,便实现了13.6%的总碳足迹削减。