Federated noisy label learning (FNLL) is emerging as a promising tool for privacy-preserving multi-source decentralized learning. Existing research, relying on the assumption of class-balanced global data, might be incapable to model complicated label noise, especially in medical scenarios. In this paper, we first formulate a new and more realistic federated label noise problem where global data is class-imbalanced and label noise is heterogeneous, and then propose a two-stage framework named FedNoRo for noise-robust federated learning. Specifically, in the first stage of FedNoRo, per-class loss indicators followed by Gaussian Mixture Model are deployed for noisy client identification. In the second stage, knowledge distillation and a distance-aware aggregation function are jointly adopted for noise-robust federated model updating. Experimental results on the widely-used ICH and ISIC2019 datasets demonstrate the superiority of FedNoRo against the state-of-the-art FNLL methods for addressing class imbalance and label noise heterogeneity in real-world FL scenarios.
翻译:联邦噪声标签学习(FNLL)正成为保护隐私的多源分布式学习中一种有前景的工具。现有研究基于全局数据类别平衡的假设,可能难以建模复杂的标签噪声问题,尤其在医疗场景中。本文首次提出一种更符合实际的新联邦标签噪声问题,其中全局数据存在类别不平衡且标签噪声具有异质性,进而提出名为FedNoRo的两阶段框架用于鲁棒联邦学习。具体而言,在FedNoRo的第一阶段,部署基于高斯混合模型的逐类别损失指标来识别噪声客户端;第二阶段,联合采用知识蒸馏与距离感知聚合函数实现鲁棒联邦模型更新。在广泛使用的ICH和ISIC2019数据集上的实验结果表明,FedNoRo在解决真实FL场景中的类别不平衡与标签噪声异质性方面优于现有最优FNLL方法。