As a privacy-preserving paradigm for training Machine Learning (ML) models, Federated Learning (FL) has received tremendous attention from both industry and academia. In a typical FL scenario, clients exhibit significant heterogeneity in terms of data distribution and hardware configurations. Thus, randomly sampling clients in each training round may not fully exploit the local updates from heterogeneous clients, resulting in lower model accuracy, slower convergence rate, degraded fairness, etc. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement. In this paper, we systematically present recent advances in the emerging field of FL client selection and its challenges and research opportunities. We hope to facilitate practitioners in choosing the most suitable client selection mechanisms for their applications, as well as inspire researchers and newcomers to better understand this exciting research topic.
翻译:作为一种训练机器学习(ML)模型的隐私保护范式,联邦学习(FL)已受到工业界和学术界的广泛关注。在典型的联邦学习场景中,客户端在数据分布和硬件配置方面表现出显著的异质性。因此,在每个训练轮次中随机采样客户端可能无法充分利用异构客户端的本地更新,导致模型精度降低、收敛速度变慢、公平性下降等问题。为解决联邦学习中的客户端异质性问题,研究者开发了多种客户端选择算法,展现出显著的性能提升效果。本文系统梳理了联邦学习客户端选择这一新兴领域的最新进展、面临挑战及研究机遇,旨在帮助实践者为特定应用选择最合适的客户端选择机制,并启发研究人员和新入门学者更深入地理解这一令人振奋的研究课题。