Federated Learning (FL) algorithms commonly sample a random subset of clients to address the straggler issue and improve communication efficiency. While recent works have proposed various client sampling methods, they have limitations in joint system and data heterogeneity design, which may not align with practical heterogeneous wireless networks. In this work, we advocate a new independent client sampling strategy to minimize the wall-clock training time of FL, while considering data heterogeneity and system heterogeneity in both communication and computation. We first derive a new convergence bound for non-convex loss functions with independent client sampling and then propose an adaptive bandwidth allocation scheme. Furthermore, we propose an efficient independent client sampling algorithm based on the upper bounds on the convergence rounds and the expected per-round training time, to minimize the wall-clock time of FL, while considering both the data and system heterogeneity. Experimental results under practical wireless network settings with real-world prototype demonstrate that the proposed independent sampling scheme substantially outperforms the current best sampling schemes under various training models and datasets.
翻译:联邦学习算法通常通过随机采样部分客户端来解决掉队问题并提升通信效率。尽管近期研究提出了多种客户端采样方法,但这些方法在联合系统与数据异构性设计方面存在局限,难以适应实际异构无线网络场景。本研究提出一种新型独立客户端采样策略,旨在综合考虑数据异构性以及通信与计算环节的系统异构性,最小化联邦学习的端到端训练时间。我们首先推导了非凸损失函数在独立客户端采样下的新收敛界,并提出自适应带宽分配方案。进一步,基于收敛轮次的理论上界与每轮期望训练时间,我们提出高效独立客户端采样算法,在兼顾数据与系统异构性的同时最小化联邦学习端到端耗时。基于实际无线网络环境与真实原型系统的实验结果表明,所提出的独立采样方案在不同训练模型与数据集下均显著优于当前最优采样方案。