Federated Learning (FL) is a distributed machine learning paradigm that enables learning models from decentralized private datasets, where the labeling effort is entrusted to the clients. While most existing FL approaches assume high-quality labels are readily available on users' devices; in reality, label noise can naturally occur in FL and is closely related to clients' characteristics. Due to scarcity of available data and significant label noise variations among clients in FL, existing state-of-the-art centralized approaches exhibit unsatisfactory performance, while prior FL studies rely on excessive on-device computational schemes or additional clean data available on server. Here, we propose FedLN, a framework to deal with label noise across different FL training stages; namely, FL initialization, on-device model training, and server model aggregation, able to accommodate the diverse computational capabilities of devices in a FL system. Specifically, FedLN computes per-client noise-level estimation in a single federated round and improves the models' performance by either correcting or mitigating the effect of noisy samples. Our evaluation on various publicly available vision and audio datasets demonstrate a 22% improvement on average compared to other existing methods for a label noise level of 60%. We further validate the efficiency of FedLN in human-annotated real-world noisy datasets and report a 4.8% increase on average in models' recognition performance, highlighting that~\method~can be useful for improving FL services provided to everyday users.
翻译:联邦学习(FL)是一种分布式机器学习范式,能够从去中心化的私有数据集中学习模型,其标注工作由客户端承担。尽管现有大多数FL方法假设用户设备具备高质量标签,但实际上标签噪声在FL中自然存在,且与客户端特性密切相关。由于FL中客户端可用数据稀缺且标签噪声差异显著,现有最先进的集中式方法表现不佳,而先前FL研究依赖于过度的设备端计算方案或服务器端可用的额外干净数据。本文提出FedLN框架,旨在应对FL不同训练阶段(即FL初始化、设备端模型训练、服务器模型聚合)的标签噪声问题,可适应FL系统中设备多样化的计算能力。具体而言,FedLN在单次联邦轮次中计算每个客户端的噪声水平估计,并通过修正或减轻噪声样本的影响来提升模型性能。我们在多种公开视觉与音频数据集上的评估表明,当标签噪声水平为60%时,相比其他现有方法平均改进22%。进一步在人工标注的真实噪声数据集上验证了FedLN的有效性,模型识别性能平均提升4.8%,凸显该方法可为改善面向日常用户的FL服务提供实用价值。