Federated Learning (FL) provides a privacy-preserving framework for training machine learning models on mobile edge devices. Traditional FL algorithms, e.g., FedAvg, impose a heavy communication workload on these devices. To mitigate this issue, Hierarchical Federated Edge Learning (HFEL) has been proposed, leveraging edge servers as intermediaries for model aggregation. Despite its effectiveness, HFEL encounters challenges such as a slow convergence rate and high resource consumption, particularly in the presence of system and data heterogeneity. However, existing works are mainly focused on improving training efficiency for traditional FL, leaving the efficiency of HFEL largely unexplored. In this paper, we consider a two-tier HFEL system, where edge devices are connected to edge servers and edge servers are interconnected through peer-to-peer (P2P) edge backhauls. Our goal is to enhance the training efficiency of the HFEL system through strategic resource allocation and topology design. Specifically, we formulate an optimization problem to minimize the total training latency by allocating the computation and communication resources, as well as adjusting the P2P connections. To ensure convergence under dynamic topologies, we analyze the convergence error bound and introduce a model consensus constraint into the optimization problem. The proposed problem is then decomposed into several subproblems, enabling us to alternatively solve it online. Our method facilitates the efficient implementation of large-scale FL at edge networks under data and system heterogeneity. Comprehensive experiment evaluation on benchmark datasets validates the effectiveness of the proposed method, demonstrating significant reductions in training latency while maintaining the model accuracy compared to various baselines.
翻译:联邦学习(FL)为在移动边缘设备上训练机器学习模型提供了一种隐私保护的框架。传统FL算法(例如FedAvg)给这些设备带来了沉重的通信负担。为缓解此问题,分层联邦边缘学习(HFEL)被提出,其利用边缘服务器作为模型聚合的中介。尽管有效,HFEL仍面临收敛速度慢和资源消耗高等挑战,尤其是在存在系统与数据异构性的情况下。然而,现有工作主要集中于提升传统FL的训练效率,对HFEL效率的探索尚不充分。本文考虑一个两层的HFEL系统,其中边缘设备连接到边缘服务器,而边缘服务器之间通过点对点(P2P)边缘回程链路互连。我们的目标是通过策略性的资源分配与拓扑设计来提升HFEL系统的训练效率。具体而言,我们构建了一个优化问题,通过分配计算与通信资源以及调整P2P连接来最小化总训练延迟。为确保动态拓扑下的收敛性,我们分析了收敛误差界,并将模型一致性约束引入优化问题中。随后,将所提问题分解为若干子问题,从而能够在线交替求解。我们的方法促进了在数据和系统异构性下,于边缘网络中高效实施大规模联邦学习。在基准数据集上的综合实验评估验证了所提方法的有效性,结果表明,与多种基线方法相比,该方法在保持模型精度的同时显著降低了训练延迟。