Federated learning (FL) has emerged as a key technique for distributed machine learning (ML). Most literature on FL has focused on ML model training for (i) a single task/model, with (ii) a synchronous scheme for uplink/downlink transfer of model parameters, and (iii) a static data distribution setting across devices. These assumptions are often not well representative of conditions encountered in practical FL environments. To address this, we develop DMA-FL, which considers dynamic FL with multiple downstream tasks to be trained over an asynchronous model transmission architecture. We first characterize the convergence of ML model training under DMA-FL via introducing a family of scheduling tensors and rectangular functions to capture the scheduling of devices. Our convergence analysis sheds light on the impact of resource allocation, device scheduling, and individual model states on the performance of ML models. We then formulate a non-convex mixed integer optimization problem for jointly configuring the resource allocation and device scheduling to strike an efficient trade-off between energy consumption and ML performance. We develop a solution methodology employing successive convex approximations with convergence guarantee to a stationary point. Through numerical simulations, we reveal the advantages of DMA-FL in terms of model performance and network resource savings.
翻译:联邦学习(FL)已成为分布式机器学习(ML)的关键技术。现有大多数FL文献聚焦于以下假设:(i)单一任务/模型的ML模型训练,(ii)模型参数上行/下行传输的同步方案,以及(iii)设备间静态数据分布设置。这些假设往往无法准确反映实际FL环境中的条件。为此,我们提出DMA-FL框架,该框架考虑在异步模型传输架构下训练多个下游任务的动态FL场景。我们首先通过引入调度张量族与矩形函数来刻画设备调度机制,从而表征DMA-FL框架下ML模型训练的收敛特性。收敛性分析揭示了资源分配、设备调度及个体模型状态对ML模型性能的影响规律。随后,我们构建了一个联合配置资源分配与设备调度的非凸混合整数优化问题,旨在实现能耗与ML性能的高效折中。我们开发了基于逐次凸逼近的求解方法,并证明其能收敛至稳定点。数值仿真结果验证了DMA-FL在模型性能与网络资源节省方面的优势。