Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and unravel challenging scenarios on the federated noisy label learning, which may guide method development in the future. We highlight the 20 basic settings for more than 5 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. \texttt{FedNoisy} is available at \codeword{https://github.com/SMILELab-FL/FedNoisy}.
翻译:联邦学习因其在不聚合客户端敏感数据的情况下进行分布式学习的优势而广受欢迎。然而,数据隔离的分布式和孤立特性可能因数据质量问题而变得复杂,使其更容易受到噪声标签的影响。目前已有许多研究致力于防御集中式或联邦环境下噪声标签的负面影响,但缺乏一个能够全面考虑典型联邦学习设置中噪声标签影响的基准。在本工作中,我们提供了首个标准化基准,能够帮助研究人员充分探索潜在的联邦噪声设置。此外,我们开展了全面实验,以探究这些数据设置的特征,并揭示联邦噪声标签学习中的挑战性场景,从而为未来的方法开发提供指导。我们重点介绍了该基准中针对超过5个数据集提出的20个基础设置,以及用于联邦噪声标签学习的标准化模拟流程。我们期望这一基准能够促进噪声标签场景下联邦学习中的想法验证。FedNoisy 可在 \codeword{https://github.com/SMILELab-FL/FedNoisy} 获取。