Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. A key challenge in FL is the uneven data distribution across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. Clustered federated learning (CFL) addresses this challenge by grouping clients based on the similarity of their data distributions. However, existing CFL approaches require a large number of communication rounds for stable cluster formation and rely on a predefined number of clusters, thus limiting their flexibility and adaptability. This paper proposes FedClust, a novel CFL approach leveraging correlations between local model weights and client data distributions. FedClust groups clients into clusters in a one-shot manner using strategically selected partial model weights and dynamically accommodates newcomers in real-time. Experimental results demonstrate FedClust outperforms baseline approaches in terms of accuracy and communication costs.
翻译:联邦学习(FL)是一种新兴的分布式机器学习范式,能够在无需暴露本地数据的情况下,使分散的设备协同训练模型。FL面临的一个关键挑战是客户端设备间的数据分布不均匀,这违反了传统机器学习中训练样本独立同分布(IID)的常见假设。聚类联邦学习(CFL)通过基于客户端数据分布相似性进行分组来应对这一挑战。然而,现有CFL方法需要大量通信轮次才能形成稳定的聚类,且依赖预设的聚类数量,从而限制了其灵活性和适应性。本文提出FedClust,一种利用局部模型权重与客户端数据分布间相关性的新型CFL方法。FedClust通过策略性地选择部分模型权重,以一次性方式将客户端分组为聚类,并能实时动态接纳新加入的客户端。实验结果表明,FedClust在准确率和通信成本方面均优于基线方法。