Federated Learning (FL) algorithms commonly sample a random subset of clients to address the straggler issue and improve communication efficiency. While recent works have proposed various client sampling methods, they have limitations in joint system and data heterogeneity design, which may not align with practical heterogeneous wireless networks. In this work, we advocate a new independent client sampling strategy to minimize the wall-clock training time of FL, while considering data heterogeneity and system heterogeneity in both communication and computation. We first derive a new convergence bound for non-convex loss functions with independent client sampling and then propose an adaptive bandwidth allocation scheme. Furthermore, we propose an efficient independent client sampling algorithm based on the upper bounds on the convergence rounds and the expected per-round training time, to minimize the wall-clock time of FL, while considering both the data and system heterogeneity. Experimental results under practical wireless network settings with real-world prototype demonstrate that the proposed independent sampling scheme substantially outperforms the current best sampling schemes under various training models and datasets.
翻译:联邦学习(FL)算法通常随机采样客户端子集以解决掉队者问题并提升通信效率。尽管近期研究提出了多种客户端采样方法,但这些方法在联合系统异构性与数据异构性设计方面存在局限,难以适配实际的异构无线网络。本文提出一种新颖的独立客户端采样策略,旨在最小化FL的端到端训练时间,同时统筹考虑通信与计算维度的数据异构性及系统异构性。我们首先推导了非凸损失函数在独立客户端采样下的新收敛界,进而提出自适应带宽分配方案。进一步地,基于收敛轮次上界与每轮期望训练时间的理论分析,我们设计了高效的独立客户端采样算法,能够在兼顾数据与系统异构性的同时最小化FL端到端时间。在真实原型平台及实际无线网络环境下的实验表明,所提出的独立采样方案在各种训练模型与数据集上均显著优于当前最优采样方案。