Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
翻译:训练大规模人工智能(AI)模型需要大量的计算能力和能源,导致碳足迹增加并对环境产生潜在影响。本文深入探讨了在地理分布(地理分布式)数据中心中训练AI模型所面临的挑战,强调了学习性能与碳足迹之间的平衡。我们考虑将联邦学习(Federated Learning, FL)作为一种解决方案,它优先考虑模型参数交换而非原始数据传输,从而确保数据隐私并符合当地法规。鉴于不同地区碳强度的差异性,我们提出了一种名为CAFE(Carbon-Aware Federated Learning的缩写)的新框架,以在固定的碳足迹预算内优化训练。我们的方法采用核心集选择来评估学习性能,利用Lyapunov漂移加罚分框架应对未来碳强度的不可预测性,并设计了一种高效算法来处理数据中心选择的组合复杂性。通过使用真实世界碳强度数据进行大量仿真,我们展示了所提算法的有效性,并突显其在优化学习性能同时最小化环境影响方面优于现有方法。