Federated learning (FL) is a promising distributed learning framework designed for privacy-aware applications of resource-constrained devices. Without sharing data, FL trains a model on each device locally and builds the global model on the server by aggregating the trained models. To reduce the communication overhead, only a portion of client devices participate in each round of training. Random selection is the most common way of selecting client devices for training data in a round of FL. However, random client selection uses distributed data and computational resources inefficiently, as it does not take into account the hardware specifications and data distribution among clients. This paper proposes FedGRA, an adaptive fair client selection algorithm designed for FL applications with unbalanced, non-Identically and Independently Distributed (IID) data running on client devices with heterogeneous computing resources. FedGRA dynamically adjusts the set of selected clients at each round of training based on clients' trained models and their available computational resources. To find an optimal solution, we model the client selection problem of FL as a multi-objective optimization by using Grey Relational Analysis (GRA) theory. To examine the performance of our proposed method, we implement our contribution on Amazon Web Services (AWS) by using 50 Elastic Compute Cloud (EC2) instances with 4 different hardware configurations. The evaluation results reveal that our contribution improves convergence significantly and reduces the average client's waiting time compared to state-of-the-art methods.
翻译:联邦学习(FL)是一种有前景的分布式学习框架,专为资源受限设备的隐私感知应用而设计。它无需共享数据,在每台设备上本地训练模型,并通过服务器聚合训练后的模型来构建全局模型。为减少通信开销,每轮训练仅部分客户端设备参与。随机选择是FL每轮训练中选取客户端设备进行数据训练的最常见方式。然而,随机客户端选择未能考虑客户端间的硬件规格和数据分布,导致分布式数据和计算资源利用效率低下。本文提出FedGRA,一种针对运行在异构计算资源客户端设备上的不平衡非独立同分布(non-IID)数据的FL应用的自适应公平客户端选择算法。FedGRA根据客户端已训练的模型及其可用计算资源,动态调整每轮训练中选中的客户端集合。为找到最优解,我们利用灰色关联分析(GRA)理论将FL的客户端选择问题建模为多目标优化。为验证所提方法的性能,我们在亚马逊云服务(AWS)上使用50个具有4种不同硬件配置的弹性计算云(EC2)实例实施贡献。评估结果表明,与最先进方法相比,我们的贡献显著提升了收敛速度,并降低了客户端的平均等待时间。