WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.

翻译：作为一种流行的分布式学习范式，联邦学习（FL）在移动设备上的应用催生了众多应用场景，但其实际部署受到参与设备计算与通信异构性的制约。一些开创性研究提出从全局模型中提取子网络，并根据设备的完整计算和通信能力，为其分配尽可能大的子网络进行本地训练。尽管这种固定大小的子网络分配实现了在异构移动设备上的联邦学习训练，但它未能考虑（i）设备通信与计算条件的动态变化，以及（ii）联邦学习训练进程及其对本地训练贡献的动态需求，这两者均可能导致联邦学习训练延迟显著延长。受这些动态特性的启发，本文提出了一种无线与异构感知的延迟高效联邦学习（WHALE-FL）方法，通过自适应子网络调度来加速联邦学习训练。WHALE-FL不再固守固定大小的子网络，而是引入了一种新颖的子网络选择效用函数，以捕捉设备与联邦学习训练的动态特性，并指导移动设备基于以下因素自适应选择本地训练的子网络规模：（a）其计算与通信能力，（b）其动态的计算和/或通信条件，以及（c）联邦学习训练状态及其对本地训练贡献的相应需求。我们的评估表明，与同类设计相比，WHALE-FL在保证学习精度的前提下，有效加速了联邦学习训练。