Federated learning offers a paradigm to the challenge of preserving privacy in distributed machine learning. However, datasets distributed across each client in the real world are inevitably heterogeneous, and if the datasets can be globally aggregated, they tend to be long-tailed distributed, which greatly affects the performance of the model. The traditional approach to federated learning primarily addresses the heterogeneity of data among clients, yet it fails to address the phenomenon of class-wise bias in global long-tailed data. This results in the trained model focusing on the head classes while neglecting the equally important tail classes. Consequently, it is essential to develop a methodology that considers classes holistically. To address the above problems, we propose a new method FedLF, which introduces three modifications in the local training phase: adaptive logit adjustment, continuous class centred optimization, and feature decorrelation. We compare seven state-of-the-art methods with varying degrees of data heterogeneity and long-tailed distribution. Extensive experiments on benchmark datasets CIFAR-10-LT and CIFAR-100-LT demonstrate that our approach effectively mitigates the problem of model performance degradation due to data heterogeneity and long-tailed distribution. our code is available at https://github.com/18sym/FedLF.
翻译:联邦学习为解决分布式机器学习中的隐私保护问题提供了一种范式。然而,现实世界中分布在各个客户端的数据集不可避免地具有异构性,并且如果这些数据集能够被全局聚合,它们往往会呈现长尾分布,这极大地影响了模型的性能。传统的联邦学习方法主要解决客户端间数据的异构性问题,但未能解决全局长尾数据中存在的类间偏差现象。这导致训练后的模型过度关注头部类别,而忽视了同等重要的尾部类别。因此,开发一种能够整体性考虑所有类别的方法至关重要。针对上述问题,我们提出了一种新方法FedLF,该方法在本地训练阶段引入了三项改进:自适应对数调整、连续类中心优化以及特征解相关。我们比较了七种在不同数据异构性和长尾分布程度下的先进方法。在基准数据集CIFAR-10-LT和CIFAR-100-LT上进行的大量实验表明,我们的方法有效缓解了因数据异构性和长尾分布导致的模型性能下降问题。我们的代码可在 https://github.com/18sym/FedLF 获取。