Federated Learning (FL) enables collaborative model training across decentralized edge devices while preserving data privacy. However, existing FL methods often assume clean annotated datasets, impractical for resource-constrained edge devices. In reality, noisy labels are prevalent, posing significant challenges to FL performance. Prior approaches attempt label correction and robust training techniques but exhibit limited efficacy, particularly under high noise levels. This paper introduces ClipFL (Federated Learning Client Pruning), a novel framework addressing noisy labels from a fresh perspective. ClipFL identifies and excludes noisy clients based on their performance on a clean validation dataset, tracked using a Noise Candidacy Score (NCS). The framework comprises three phases: pre-client pruning to identify potential noisy clients and calculate their NCS, client pruning to exclude a percentage of clients with the highest NCS, and post-client pruning for fine-tuning the global model with standard FL on clean clients. Empirical evaluation demonstrates ClipFL's efficacy across diverse datasets and noise levels, achieving accurate noisy client identification, superior performance, faster convergence, and reduced communication costs compared to state-of-the-art FL methods. Our code is available at https://github.com/MMorafah/ClipFL.
翻译:联邦学习(FL)使得去中心化的边缘设备能够进行协同模型训练,同时保护数据隐私。然而,现有的联邦学习方法通常假设数据标注是干净的,这对于资源受限的边缘设备而言并不现实。实际上,噪声标签普遍存在,给联邦学习的性能带来了重大挑战。先前的方法尝试通过标签校正和鲁棒训练技术来解决此问题,但其效果有限,尤其是在高噪声水平下。本文提出了ClipFL(联邦学习客户端剪枝),这是一个从全新视角解决噪声标签问题的新颖框架。ClipFL基于客户端在一个干净验证数据集上的表现来识别并排除噪声客户端,该过程通过一个噪声候选分数(NCS)进行追踪。该框架包含三个阶段:客户端预剪枝,用于识别潜在的噪声客户端并计算其NCS;客户端剪枝,用于排除NCS最高的一定比例的客户端;以及客户端后剪枝,用于在干净的客户端上通过标准联邦学习对全局模型进行微调。实证评估表明,ClipFL在多种数据集和噪声水平下均表现出色,与最先进的联邦学习方法相比,能够实现准确的噪声客户端识别、更优的性能、更快的收敛速度以及更低的通信成本。我们的代码可在 https://github.com/MMorafah/ClipFL 获取。