This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue. Specifically, we first analyze the effect of CS in FL and show that FL training can be accelerated by adequately choosing participants to diversify the training dataset in each round of training. Based on this, we leverage data profiling and determinantal point process (DPP) sampling techniques to develop an algorithm termed Federated Learning with DPP-based Participant Selection (FL-DP$^3$S). This algorithm effectively diversifies the participants' datasets in each round of training while preserving their data privacy. We conduct extensive experiments to examine the efficacy of our proposed method. The results show that our scheme attains a faster convergence rate, as well as a smaller communication overhead than several baselines.
翻译:本文提出了一种客户端选择(CS)方法,旨在解决联邦学习(FL)中的通信瓶颈问题,同时应对FL的数据异构性挑战。具体而言,我们首先分析了CS在FL中的影响,证明通过合理选择参与者以在每一轮训练中多样化训练数据集,可以加速FL训练进程。基于此,我们利用数据画像和行列式点过程(DPP)采样技术,提出了一种名为"基于DPP参与者选择的联邦学习"(FL-DP$^3$S)算法。该算法在有效实现每轮训练中参与者数据集多样化的同时,充分保护了数据隐私。我们通过大量实验验证了所提方法的有效性。结果表明,与多个基准方案相比,我们的方案实现了更快的收敛速度和更低的通信开销。