Federated learning is a distributed learning framework that takes full advantage of private data samples kept on edge devices. In real-world federated learning systems, these data samples are often decentralized and Non-Independently Identically Distributed (Non-IID), causing divergence and performance degradation in the federated learning process. As a new solution, clustered federated learning groups federated clients with similar data distributions to impair the Non-IID effects and train a better model for every cluster. This paper proposes StoCFL, a novel clustered federated learning approach for generic Non-IID issues. In detail, StoCFL implements a flexible CFL framework that supports an arbitrary proportion of client participation and newly joined clients for a varying FL system, while maintaining a great improvement in model performance. The intensive experiments are conducted by using four basic Non-IID settings and a real-world dataset. The results show that StoCFL could obtain promising cluster results even when the number of clusters is unknown. Based on the client clustering results, models trained with StoCFL outperform baseline approaches in a variety of contexts.
翻译:联邦学习是一种充分利用边缘设备上私有数据样本的分布式学习框架。在实际联邦学习系统中,这些数据样本通常呈分散状态且非独立同分布(Non-IID),导致联邦学习过程出现发散及性能下降。作为新型解决方案,聚类联邦学习将具有相似数据分布的联邦客户端分组,以削弱非独立同分布影响,并为每个聚类训练更优模型。本文提出StoCFL——一种面向通用非独立同分布问题的新型聚类联邦学习方法。具体而言,StoCFL实现了灵活的聚类联邦学习(CFL)框架:支持任意比例的客户端参与及新客户端加入可变联邦学习(FL)系统,同时显著提升模型性能。本研究采用四种基础非独立同分布设定及真实数据集进行密集实验。结果表明:即使在聚类数量未知的情况下,StoCFL仍能获得理想的聚类结果。基于客户端聚类结果,使用StoCFL训练的模型在多种场景下均优于基线方法。