FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

Federated learning with noisy labels (F-LNL) aims at seeking an optimal server model via collaborative distributed learning by aggregating multiple client models trained with local noisy or clean samples. On the basis of a federated learning framework, recent advances primarily adopt label noise filtering to separate clean samples from noisy ones on each client, thereby mitigating the negative impact of label noise. However, these prior methods do not learn noise filters by exploiting knowledge across all clients, leading to sub-optimal and inferior noise filtering performance and thus damaging training stability. In this paper, we present FedDiv to tackle the challenges of F-LNL. Specifically, we propose a global noise filter called Federated Noise Filter for effectively identifying samples with noisy labels on every client, thereby raising stability during local training sessions. Without sacrificing data privacy, this is achieved by modeling the global distribution of label noise across all clients. Then, in an effort to make the global model achieve higher performance, we introduce a Predictive Consistency based Sampler to identify more credible local data for local model training, thus preventing noise memorization and further boosting the training stability. Extensive experiments on CIFAR-10, CIFAR-100, and Clothing1M demonstrate that \texttt{FedDiv} achieves superior performance over state-of-the-art F-LNL methods under different label noise settings for both IID and non-IID data partitions. Source code is publicly available at https://github.com/lijichang/FLNL-FedDiv.

翻译：含噪标签联邦学习（F-LNL）旨在通过聚合多个使用本地含噪或干净样本训练的客户端模型，借助协作分布式学习寻求最优服务器模型。基于联邦学习框架，近期研究主要采用标签噪声过滤方法在客户端上分离干净样本与含噪样本，从而减轻标签噪声的负面影响。然而，这些现有方法未能利用所有客户端的知识学习噪声过滤器，导致噪声过滤性能次优且效果不佳，进而损害训练稳定性。本文提出FedDiv以应对F-LNL的挑战。具体而言，我们提出一种名为联邦噪声过滤器的全局噪声过滤器，用于有效识别每个客户端上带噪标签的样本，从而提升本地训练阶段的稳定性。在不牺牲数据隐私的前提下，该方案通过建模所有客户端间的标签噪声全局分布实现。进一步地，为了提升全局模型性能，我们引入基于预测一致性的采样器来识别更可信的本地数据用于本地模型训练，从而防止噪声记忆并进一步增强训练稳定性。在CIFAR-10、CIFAR-100和Clothing1M上的大量实验表明，在IID和非IID数据划分的不同标签噪声设置下，FedDiv均取得了优于现有最先进F-LNL方法的性能。源代码已公开发布于https://github.com/lijichang/FLNL-FedDiv。