Federated Learning (FL) is an emerging machine learning technique that enables distributed model training across data silos or edge devices without data sharing. Yet, FL inevitably introduces inefficiencies compared to centralized model training, which will further increase the already high energy usage and associated carbon emissions of machine learning in the future. Although the scheduling of workloads based on the availability of low-carbon energy has received considerable attention in recent years, it has not yet been investigated in the context of FL. However, FL is a highly promising use case for carbon-aware computing, as training jobs constitute of energy-intensive batch processes scheduled in geo-distributed environments. We propose FedZero, a FL system that operates exclusively on renewable excess energy and spare capacity of compute infrastructure to effectively reduce the training's operational carbon emissions to zero. Based on energy and load forecasts, FedZero leverages the spatio-temporal availability of excess energy by cherry-picking clients for fast convergence and fair participation. Our evaluation, based on real solar and load traces, shows that FedZero converges considerably faster under the mentioned constraints than state-of-the-art approaches, is highly scalable, and is robust against forecasting errors.
翻译:摘要:联邦学习是一种新兴的机器学习技术,能够在数据孤岛或边缘设备之间实现分布式模型训练,而无需共享数据。然而,与集中式模型训练相比,联邦学习不可避免地会引入效率低下问题,这将进一步加剧机器学习领域本已高昂的能源消耗及相关碳排放。尽管近年来基于低碳能源可用性的工作负载调度已受到广泛关注,但在联邦学习背景下尚未得到充分探索。然而,联邦学习是低碳计算极具前景的应用场景,因为训练任务由部署在分布式环境中的高能耗批处理作业组成。本文提出FedZero系统,该系统仅利用可再生能源剩余电量和计算基础设施的闲置容量运行,从而将训练过程的实际碳排放降至零。基于能源和负载预测,FedZero通过选择性选取客户端进行快速收敛与公平参与,充分开发可再生能源的时空可用性。基于真实太阳能和负载数据的评估表明,在所述约束条件下,FedZero的收敛速度显著优于现有方法,且具有良好的可扩展性和对预测误差的鲁棒性。