This study investigates clustered federated learning (FL), one of the formulations of FL with non-i.i.d. data, where the devices are partitioned into clusters and each cluster optimally fits its data with a localized model. We propose a clustered FL framework that incorporates a nonconvex penalty to pairwise differences of parameters. This framework can automatically identify cluster structures without a priori knowledge of the number of clusters and the set of devices in each cluster. To implement the proposed framework, we introduce a novel clustered FL method called Fusion Penalized Federated Clustering (FPFC). Building upon the standard alternating direction method of multipliers (ADMM), FPFC is implemented in parallel, updates only a subset of devices at each communication round, and allows for variable workload per device. These strategies significantly reduce the communication cost while ensuring privacy, making it practical for FL. We also propose a new warmup strategy for hyperparameter tuning in FL settings and explore the asynchronous variant of FPFC (asyncFPFC). Theoretical analysis provides convergence guarantees for FPFC with general nonconvex losses and establishes the statistical convergence rate under a linear model with squared loss. Extensive experiments demonstrate the advantages of FPFC over existing methods, including robustness and generalization capability.
翻译:本研究探讨了聚类联邦学习(FL),这是处理非独立同分布数据的FL的一种形式,其中设备被划分为多个簇,每个簇使用局部化模型最优拟合其数据。我们提出了一种聚类FL框架,该框架将非凸惩罚项应用于参数的成对差异,无需预先知道簇的数量及每个簇中的设备集合,即可自动识别簇结构。为了实现所提出的框架,我们引入了一种名为融合惩罚联邦聚类(FPFC)的新型聚类FL方法。FPFC基于标准交替方向乘子法(ADMM)实现,采用并行执行方式,每轮通信仅更新部分设备,并允许设备的工作负载可变。这些策略在确保隐私的同时显著降低了通信成本,使其适用于实际FL场景。我们还提出了一种适用于FL设置中超参数调整的新型预热策略,并探索了FPFC的异步变体(asyncFPFC)。理论分析为使用一般非凸损失的FPFC提供了收敛性保证,并在带有平方损失的线性模型下建立了统计收敛速率。大量实验证明了FPFC相较于现有方法在鲁棒性和泛化能力方面的优势。