Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters. One effective client clustering strategy is to allow clients to choose their own local models from a model pool based on their performance. However, without pre-trained model parameters, such a strategy is prone to clustering failure, in which all clients choose the same model. Unfortunately, collecting a large amount of labeled data for pre-training can be costly and impractical in distributed environments. To overcome this challenge, we leverage self-supervised contrastive learning to exploit unlabeled data for the pre-training of FL systems. Together, self-supervised pre-training and client clustering can be crucial components for tackling the data heterogeneity issues of FL. Leveraging these two crucial strategies, we propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems. In this work, we demonstrate the effectiveness of CP-CFL through extensive experiments in heterogeneous FL settings, and present various interesting observations.
翻译:联邦学习(FL)是一种有前景的方法,使分布式客户端能够在保护数据隐私的同时协作训练全局模型。然而,FL常面临数据异构问题,这会显著影响其性能。为此,研究者提出了聚类联邦学习(CFL),为不同客户端集群构建个性化模型。一种有效的客户端聚类策略是允许客户端根据自身性能从模型池中选择本地模型。然而,若无预训练模型参数,该策略易导致聚类失败(即所有客户端选择相同模型)。在分布式环境中,收集大量标注数据进行预训练成本高昂且不切实际。为克服这一挑战,我们利用自监督对比学习挖掘未标注数据,用于FL系统的预训练。自监督预训练与客户端聚类相结合,可成为解决FL数据异构问题的关键组件。基于这两项关键策略,我们提出基于对比预训练的聚类联邦学习(CP-CFL),以提升FL系统的模型收敛速度与整体性能。本文通过在异构FL设置下的广泛实验验证了CP-CFL的有效性,并展示了多项有意义的发现。