Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients. But meanwhile, the distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels. Many efforts exist to defend against the negative impacts of noisy labels in centralized or federated settings. However, there is a lack of a benchmark that comprehensively considers the impact of noisy labels in a wide variety of typical FL settings. In this work, we serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings. Also, we conduct comprehensive experiments to explore the characteristics of these data settings and unravel challenging scenarios on the federated noisy label learning, which may guide method development in the future. We highlight the 20 basic settings for more than 5 datasets proposed in our benchmark and standardized simulation pipeline for federated noisy label learning. We hope this benchmark can facilitate idea verification in federated learning with noisy labels. \texttt{FedNoisy} is available at \codeword{https://github.com/SMILELab-FL/FedNoisy}.
翻译:联邦学习因其在不聚合客户端敏感数据的情况下进行分布式学习而受到广泛关注。然而,数据隔离的分布式和孤立特性可能因数据质量问题而复杂化,使其更容易受到噪声标签的影响。已有许多工作在集中式或联邦环境中抵御噪声标签的负面影响。然而,目前缺乏一个在典型联邦学习设置中全面考虑噪声标签影响的基准。在本工作中,我们提供了首个标准化基准,可帮助研究人员充分探索潜在的联邦噪声场景。此外,我们进行了全面的实验,以探究这些数据设置的特性,并揭示了联邦噪声标签学习中的挑战性场景,这有望指导未来的方法开发。我们在基准中提出了涵盖5个以上数据集的20个基本设置,以及用于联邦噪声标签学习的标准化模拟流程。希望此基准能够促进带噪声标签的联邦学习中的思想验证。\texttt{FedNoisy} 可在 \codeword{https://github.com/SMILELab-FL/FedNoisy} 获取。