Federated learning (FL) is a popular distributed machine learning (ML) technique. However, implementation of FL over 5G-and-beyond wireless networks faces key challenges caused by (i) dynamics of the wireless network conditions and (ii) the coexistence of multiple FL-services in the system, which are not jointly considered in prior works. We first take a closer look into these challenges and unveil nuanced phenomena called over-/under-provisioning of resources and perspective-driven load balancing. We then take the first steps towards addressing these phenomena by proposing a novel distributed ML architecture called elastic FL (EFL). EFL unleashes the full potential of Open RAN (O-RAN) systems and introduces an elastic resource provisioning methodology to execute FL-services. It further constitutes a multi-time-scale FL management system that introduces three dedicated network control functionalities tailored for FL-services, including (i) non-real-time (non-RT) system descriptor, which trains ML-based applications to predicted both system and FL-related dynamics and parameters; (ii) near-RT FL controller, which handles O-RAN slicing and mobility management for the seamless execution of FL-services; (iii) FL MAC scheduler, which conducts real-time resource allocation to the end clients of various FL-services. We finally prototype EFL to demonstrate its potential in improving the performance of FL-services.
翻译:联邦学习(FL)是一种流行的分布式机器学习(ML)技术。然而,在5G及未来无线网络上实施FL面临关键挑战,这些挑战源于(i)无线网络条件的动态变化,以及(ii)系统中多个FL服务的共存,而现有研究并未综合考虑这两方面。我们首先深入审视这些挑战,揭示了资源过度/不足配置和视角驱动的负载均衡等微妙现象。随后,我们通过提出一种名为弹性联邦学习(EFL)的新型分布式ML架构,迈出了应对这些现象的第一步。EFL充分发挥了开放无线接入网(O-RAN)系统的潜力,并引入了一种弹性资源供给方法来执行FL服务。它进一步构建了一个多时间尺度的FL管理系统,该系统引入了三个专为FL服务定制的网络控制功能,包括:(i)非实时(non-RT)系统描述器,用于训练基于ML的应用程序以预测系统和FL相关的动态及参数;(ii)近实时(near-RT)FL控制器,负责处理O-RAN切片和移动性管理,以确保FL服务的无缝执行;(iii)FL MAC调度器,负责对不同FL服务的终端客户端进行实时资源分配。最后,我们实现了EFL的原型系统,以展示其在提升FL服务性能方面的潜力。