WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing and communications capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.

翻译：作为一种流行的分布式学习范式，移动设备上的联邦学习（Federated Learning, FL）催生了众多应用，然而其实际部署受到参与设备计算与通信异构性的制约。前期部分开创性研究提出从全局模型中提取子网络，并根据设备的完整计算与通信容量为其分配尽可能大的子网络进行本地训练。尽管这种固定大小的子网络分配方案能够实现异构移动设备上的联邦学习训练，但它未能感知以下两类动态变化：（i）设备通信与计算条件的动态波动，以及（ii）联邦学习训练进程及其对本地训练贡献的动态需求——这两者均会导致联邦学习训练延迟显著增加。受上述动态特性启发，本文提出一种无线与异构感知的高效联邦学习方法（WHALE-FL），通过自适应子网调度加速联邦学习训练。WHALE-FL摒弃固定大小子网络分配策略，创新性地引入子网络选择效用函数，该函数能够捕捉设备状态与训练过程的动态特性，并引导移动设备基于以下三个维度自适应选择本地训练的子网络规模：（a）计算与通信容量，（b）动态计算和/或通信条件，以及（c）联邦学习训练状态及其对本地训练贡献的相应需求。实验评估表明，与同类方案相比，WHALE-FL在保证学习精度的前提下有效加速了联邦学习训练过程。