FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels

Federated learning with noisy labels (F-LNL) aims at seeking an optimal server model via collaborative distributed learning by aggregating multiple client models trained with local noisy or clean samples. On the basis of a federated learning framework, recent advances primarily adopt label noise filtering to separate clean samples from noisy ones on each client, thereby mitigating the negative impact of label noise. However, these prior methods do not learn noise filters by exploiting knowledge across all clients, leading to sub-optimal and inferior noise filtering performance and thus damaging training stability. In this paper, we present FedDiv to tackle the challenges of F-LNL. Specifically, we propose a global noise filter called Federated Noise Filter for effectively identifying samples with noisy labels on every client, thereby raising stability during local training sessions. Without sacrificing data privacy, this is achieved by modeling the global distribution of label noise across all clients. Then, in an effort to make the global model achieve higher performance, we introduce a Predictive Consistency based Sampler to identify more credible local data for local model training, thus preventing noise memorization and further boosting the training stability. Extensive experiments on CIFAR-10, CIFAR-100, and Clothing1M demonstrate that \texttt{FedDiv} achieves superior performance over state-of-the-art F-LNL methods under different label noise settings for both IID and non-IID data partitions. Source code is publicly available at https://github.com/lijichang/FLNL-FedDiv.

翻译：含噪标签联邦学习（F-LNL）旨在通过聚合多个使用本地含噪或干净样本训练的客户端模型，经由协同分布式学习寻找最优服务器模型。基于联邦学习框架，现有方法主要在每个客户端上采用标签噪声过滤技术分离干净样本与含噪样本，从而缓解标签噪声的负面影响。然而，这些现有方法未能利用所有客户端的知识进行噪声过滤器的学习，导致噪声过滤性能欠优且表现低下，进而损害训练稳定性。本文提出FedDiv以解决F-LNL中的挑战。具体而言，我们提出一种名为联邦噪声过滤器的全局噪声过滤器，用于有效识别每个客户端上带有噪声标签的样本，从而提升本地训练阶段的稳定性。在不牺牲数据隐私的前提下，这通过建模所有客户端间标签噪声的全局分布实现。进而，为使全局模型获得更优性能，我们引入基于预测一致性的采样器，为本地模型训练识别更可信的本地数据，从而防止噪声记忆并进一步强化训练稳定性。在CIFAR-10、CIFAR-100和Clothing1M上的大量实验表明，在IID与非IID数据划分的不同标签噪声设置下，FedDiv均取得了优于现有最先进F-LNL方法的性能。源代码已公开于https://github.com/lijichang/FLNL-FedDiv。