Federated learning (FL) is a distributed framework for collaboratively training with privacy guarantees. In real-world scenarios, clients may have Non-IID data (local class imbalance) with poor annotation quality (label noise). The co-existence of label noise and class imbalance in FL's small local datasets renders conventional FL methods and noisy-label learning methods both ineffective. To address the challenges, we propose FedCNI without using an additional clean proxy dataset. It includes a noise-resilient local solver and a robust global aggregator. For the local solver, we design a more robust prototypical noise detector to distinguish noisy samples. Further to reduce the negative impact brought by the noisy samples, we devise a curriculum pseudo labeling method and a denoise Mixup training strategy. For the global aggregator, we propose a switching re-weighted aggregation method tailored to different learning periods. Extensive experiments demonstrate our method can substantially outperform state-of-the-art solutions in mix-heterogeneous FL environments.
翻译:联邦学习(FL)是一种在隐私保护下进行协作训练的分布式框架。在实际应用场景中,客户端可能面临非独立同分布数据(局部类别不平衡)与低质量标注(标签噪声)的双重挑战。FL小规模本地数据集中标签噪声与类别不平衡的共存现象,导致传统FL方法和噪声标签学习方法均效果不佳。为解决这些挑战,我们提出无需额外干净代理数据集的FedCNI方法。该方法包含抗噪声本地求解器与鲁棒全局聚合器两部分:在本地求解器中,我们设计了更鲁棒的原型噪声检测器以识别噪声样本;为降低噪声样本带来的负面影响,我们提出了课程式伪标签方法与去噪混合训练策略。在全局聚合器中,我们提出了针对不同学习阶段定制的切换式重加权聚合方法。大量实验表明,本方法在混合异质FL环境中显著优于现有最优解决方案。