Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic policy gradient (DDPG) framework, referred to as ``TP-DDPG'', to balance online the learning delay and model accuracy of an FL process in an energy harvesting-powered HFL system. The key idea is that we divide optimization decisions into two groups, and employ DDPG to learn one group in the first phase, while interpreting the other group as part of the environment to provide rewards for training the DDPG in the second phase. Specifically, the DDPG learns the selection of participating clients, and their CPU configurations and the transmission powers. A new straggler-aware client association and bandwidth allocation (SCABA) algorithm efficiently optimizes the other decisions and evaluates the reward for the DDPG. Experiments demonstrate that with substantially reduced number of learnable parameters, the TP-DDPG can quickly converge to effective polices that can shorten the training time of HFL by 39.4% compared to its benchmarks, when the required test accuracy of HFL is 0.9.
翻译:联邦学习(FL)是一种在不共享数据的情况下训练共享机器学习模型的可行技术。分层联邦学习(HFL)系统在能源、计算、通信和客户端调度的多个层面仍有待研究,特别是当客户端依赖能量收集为其运行供电时。本文提出了一种新的两阶段深度确定性策略梯度(DDPG)框架,称为“TP-DDPG”,用于在能量收集供电的HFL系统中在线平衡FL过程的学习延迟和模型精度。其核心思想是将优化决策分为两组,并在第一阶段使用DDPG学习其中一组,同时将另一组决策解释为环境的一部分,为第二阶段训练DDPG提供奖励。具体而言,DDPG学习参与客户端的选择、其CPU配置和传输功率。一种新的考虑掉队者的客户端关联与带宽分配(SCABA)算法高效地优化了其他决策,并为DDPG评估奖励。实验表明,在所需HFL测试精度为0.9的条件下,与基准方法相比,TP-DDPG能够以显著减少的可学习参数数量快速收敛到有效策略,从而将HFL的训练时间缩短39.4%。