Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.
翻译:摘要: 训练大规模人工智能模型需消耗大量算力与能源,导致碳足迹增加并可能引发环境问题。本文深入探讨了在地理分布的数据中心中训练AI模型的挑战,重点关注学习性能与碳足迹之间的平衡。我们采用联邦学习(FL)作为解决方案,该方案强调模型参数交换而非原始数据传输,从而在确保数据隐私的同时符合本地法规。考虑到不同区域碳强度的差异性,我们提出一种名为CAFE(碳感知联邦学习)的新框架,以在固定碳足迹预算内优化训练过程。该方法通过核心集选择评估学习性能,利用李雅普诺夫漂移加罚框架应对未来碳强度不可预测性,并设计高效算法解决数据中心选择的组合复杂度问题。基于真实碳强度数据的大规模仿真结果表明,该算法在优化学习性能与最小化环境影响方面均优于现有方法。