Federated Learning (FL) is an emerging machine learning technique that enables distributed model training across data silos or edge devices without data sharing. Yet, FL inevitably introduces inefficiencies compared to centralized model training, which will further increase the already high energy usage and associated carbon emissions of machine learning in the future. One idea to reduce FL's carbon footprint is to schedule training jobs based on the availability of renewable excess energy that can occur at certain times and places in the grid. However, in the presence of such volatile and unreliable resources, existing FL schedulers cannot always ensure fast, efficient, and fair trainings. We propose FedZero, an FL system that operates exclusively on renewable excess energy and spare capacity of compute infrastructure to effectively reduce a training's operational carbon emissions to zero. Using energy and load forecasts, FedZero leverages the spatio-temporal availability of excess resources by selecting clients for fast convergence and fair participation. Our evaluation, based on real solar and load traces, shows that FedZero converges significantly faster than existing approaches under the mentioned constraints while consuming less energy. Furthermore, it is robust to forecasting errors and scalable to tens of thousands of clients.
翻译:联邦学习(FL)是一种新兴的机器学习技术,能够在数据孤岛或边缘设备之间实现分布式模型训练,无需共享数据。然而,与集中式模型训练相比,FL不可避免地引入效率低下问题,这将进一步加剧机器学习未来本已高昂的能耗及相关碳排放。减少FL碳足迹的一种思路是根据电网中特定时间和地点产生的可再生能源富余电力可用性来调度训练任务。但在此类波动性高且不可靠的资源条件下,现有FL调度器无法始终确保快速、高效且公平的训练。我们提出FedZero——一种完全依赖可再生能源富余电力和计算基础设施闲置容量的FL系统,从而将训练的运行碳排放有效降至零。通过能源和负载预测,FedZero利用富余资源的时空可用性,选择客户端以实现快速收敛和公平参与。基于真实太阳能和负载数据的评估表明,在所述约束条件下,FedZero的收敛速度显著快于现有方法,且能耗更低。此外,该系统对预测误差具有鲁棒性,并可扩展至数万个客户端。