Federated learning is designed to enhance data security and privacy, but faces challenges when dealing with heterogeneous data in long-tailed and non-IID distributions. This paper explores an overlooked scenario where tail classes are sparsely distributed over a few clients, causing the models trained with these classes to have a lower probability of being selected during client aggregation, leading to slower convergence rates and poorer model performance. To address this issue, we propose a two-stage Decoupled Federated learning framework using Feature Statistics (DFL-FS). In the first stage, the server estimates the client's class coverage distributions through masked local feature statistics clustering to select models for aggregation to accelerate convergence and enhance feature learning without privacy leakage. In the second stage, DFL-FS employs federated feature regeneration based on global feature statistics and utilizes resampling and weighted covariance to calibrate the global classifier to enhance the model's adaptability to long-tailed data distributions. We conducted experiments on CIFAR10-LT and CIFAR100-LT datasets with various long-tailed rates. The results demonstrate that our method outperforms state-of-the-art methods in both accuracy and convergence rate.
翻译:联邦学习旨在增强数据安全性与隐私保护,但在处理长尾和非独立同分布(non-IID)异构数据时面临挑战。本文探讨了一个被忽视的场景:尾类数据稀疏分布在少数客户端上,导致基于这些类别训练的模型在客户端聚合过程中被选中的概率较低,进而造成收敛速度减慢和模型性能下降。为解决此问题,我们提出了一种基于特征统计的二阶段解耦联邦学习框架(DFL-FS)。在第一阶段,服务器通过掩码化局部特征统计聚类估计客户端的类别覆盖分布,从而选择模型进行聚合以加速收敛并增强特征学习,同时避免隐私泄露。在第二阶段,DFL-FS基于全局特征统计进行联邦特征重生成,并利用重采样和加权协方差对全局分类器进行校准,以提升模型对长尾数据分布的适应性。我们在CIFAR10-LT和CIFAR100-LT数据集上开展了不同长尾率的实验,结果表明,我们的方法在准确率和收敛速度上均优于现有最优方法。