Multi-server Federated Learning (FL) has emerged as a promising solution to mitigate communication bottlenecks of single-server FL. In a typical multi-server FL architecture, the regions covered by different edge servers (ESs) may overlap. Under this architecture, clients located in the overlapping areas can access edge models from multiple ESs. Building on this observation, we propose a cloud-free multi-server FL framework that leverages Overlapping Clients (OCs) as relays for inter-server model exchange while uploading the local updated model to ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs without introducing new communication links. We derive a new convergence upper bound for non-convex objectives under non-IID data and an arbitrary number of cells, which explicitly quantifies the impact of inter-server propagation depth on convergence error. Guided by this theoretical result, we formulate an optimization problem that aims to maximize dissemination range of each ES model among all ESs within a limited latency. To solve this problem, we develop a conflict-graph-based local search algorithm optimizing the routing strategy and scheduling the transmission times of individual ESs to its neighboring ESs. This enables ES models to be relayed across multiple hops through neighboring ESs by OCs, achieving the widest possible transmission coverage for each model without introducing new communication links. Extensive experimental results show remarkable performance gains of our scheme compared to existing methods.
翻译:多服务器联邦学习(FL)已成为缓解单服务器FL通信瓶颈的一种有前景的解决方案。在典型的多服务器FL架构中,不同边缘服务器(ES)覆盖的区域可能存在重叠。在此架构下,位于重叠区域的客户端可以访问来自多个ES的边缘模型。基于这一观察,我们提出了一种无云的多服务器FL框架,该框架利用重叠客户端(OC)作为服务器间模型交换的中继,同时将本地更新模型上传至ES。这使得ES模型能够通过OC在相邻ES之间进行多跳中继,而无需引入新的通信链路。我们针对非独立同分布数据和任意数量小区下的非凸目标,推导了一个新的收敛上界,该上界明确量化了服务器间传播深度对收敛误差的影响。在此理论结果的指导下,我们构建了一个优化问题,旨在有限延迟内最大化每个ES模型在所有ES中的传播范围。为解决此问题,我们开发了一种基于冲突图的局部搜索算法,以优化路由策略并调度各个ES与其相邻ES的传输时间。这使得ES模型能够通过OC在相邻ES之间进行多跳中继,在无需引入新通信链路的前提下,为每个模型实现尽可能广泛的传输覆盖。大量实验结果表明,与现有方法相比,我们的方案取得了显著的性能提升。