Federated learning (FL) has emerged as a key technique for distributed machine learning (ML). Most literature on FL has focused on systems with (i) ML model training for a single task/model, (ii) a synchronous setting for uplink/downlink transfer of model parameters, which is often unrealistic. To address this, we develop MA-FL, which considers FL with multiple downstream tasks to be trained over an asynchronous model transmission architecture. We first characterize the convergence of ML model training under MA-FL via introducing a family of scheduling tensors to capture the scheduling of devices. Our convergence analysis sheds light on the impact of resource allocation (e.g., the mini-batch size and number of gradient descent iterations), device scheduling, and individual model states (i.e., warmed vs. cold initialization) on the performance of ML models. We then formulate a non-convex mixed integer optimization problem for jointly configuring the resource allocation and device scheduling to strike an efficient trade-off between energy consumption and ML performance, which is solved via successive convex approximations. Through numerical simulations, we reveal the advantages of MA-FL in terms of model performance and network resource savings.
翻译:联邦学习(FL)已成为分布式机器学习(ML)的关键技术。大多数关于FL的文献集中于以下系统:(i)针对单一任务/模型的ML模型训练,(ii)模型参数上行/下行传输的同步设置,这通常不切实际。为解决此问题,我们提出了MA-FL,该方案考虑在异步模型传输架构下训练多个下游任务的FL。我们首先通过引入一系列调度张量来捕捉设备的调度,从而刻画MA-FL下ML模型训练的收敛性。我们的收敛性分析揭示了资源分配(如小批量大小和梯度下降迭代次数)、设备调度以及个体模型状态(即热启动与冷启动)对ML模型性能的影响。随后,我们构建了一个非凸混合整数优化问题,用于联合配置资源分配和设备调度,以在能耗与ML性能之间实现有效权衡,并通过逐次凸近似方法求解。通过数值仿真,我们展示了MA-FL在模型性能和网络资源节省方面的优势。