FL-GUARD: A Holistic Framework for Run-Time Detection and Recovery of Negative Federated Learning

Federated learning (FL) is a promising approach for learning a model from data distributed on massive clients without exposing data privacy. It works effectively in the ideal federation where clients share homogeneous data distribution and learning behavior. However, FL may fail to function appropriately when the federation is not ideal, amid an unhealthy state called Negative Federated Learning (NFL), in which most clients gain no benefit from participating in FL. Many studies have tried to address NFL. However, their solutions either (1) predetermine to prevent NFL in the entire learning life-cycle or (2) tackle NFL in the aftermath of numerous learning rounds. Thus, they either (1) indiscriminately incur extra costs even if FL can perform well without such costs or (2) waste numerous learning rounds. Additionally, none of the previous work takes into account the clients who may be unwilling/unable to follow the proposed NFL solutions when using those solutions to upgrade an FL system in use. This paper introduces FL-GUARD, a holistic framework that can be employed on any FL system for tackling NFL in a run-time paradigm. That is, to dynamically detect NFL at the early stage (tens of rounds) of learning and then to activate recovery measures when necessary. Specifically, we devise a cost-effective NFL detection mechanism, which relies on an estimation of performance gain on clients. Only when NFL is detected, we activate the NFL recovery process, in which each client learns in parallel an adapted model when training the global model. Extensive experiment results confirm the effectiveness of FL-GUARD in detecting NFL and recovering from NFL to a healthy learning state. We also show that FL-GUARD is compatible with previous NFL solutions and robust against clients unwilling/unable to take any recovery measures.

翻译：联邦学习（FL）是一种从海量客户端分布式数据中学习模型且不泄露数据隐私的先进方法。在理想联邦场景下（客户端共享同质数据分布与学习行为），FL能有效运行。然而，当联邦处于非理想状态时，FL可能功能失常，陷入称为"负联邦学习"（NFL）的非健康状态——此时大多数客户端无法从FL参与中获得收益。已有众多研究尝试解决NFL问题，但现有方案要么（1）在整个学习周期中预设NFL预防机制，要么（2）在大量学习轮次后才处理NFL。前者即使FL无需额外成本也能良好运行，仍会无差别增加开销；后者则浪费大量学习轮次。此外，当使用现有方案升级FL系统时，尚无研究考虑客户端可能不愿/无法遵循所提出的NFL解决方案。本文提出FL-GUARD——一种可在任意FL系统上部署的全方位框架，通过运行时范式应对NFL问题：在学习的早期阶段（数十轮）动态检测NFL，并在必要时激活恢复措施。具体而言，我们设计了一种经济高效的NFL检测机制，通过估算客户端性能增益实现检测。仅当检测到NFL时，才激活恢复过程——在此过程中，每个客户端在训练全局模型的同时并行学习自适应模型。大量实验证实了FL-GUARD在检测NFL及从NFL恢复至健康学习状态方面的有效性。我们还证明FL-GUARD与既往NFL解决方案兼容，并对不愿/无法采取恢复措施的客户端具有鲁棒性。