ARFED: Attack-Resistant Federated averaging based on outlier elimination

In federated learning, each participant trains its local model with its own data and a global model is formed at a trusted server by aggregating model updates coming from these participants. Since the server has no effect and visibility on the training procedure of the participants to ensure privacy, the global model becomes vulnerable to attacks such as data poisoning and model poisoning. Although many defense algorithms have recently been proposed to address these attacks, they often make strong assumptions that do not agree with the nature of federated learning, such as assuming Non-IID datasets. Moreover, they mostly lack comprehensive experimental analyses. In this work, we propose a defense algorithm called ARFED that does not make any assumptions about data distribution, update similarity of participants, or the ratio of the malicious participants. ARFED mainly considers the outlier status of participant updates for each layer of the model architecture based on the distance to the global model. Hence, the participants that do not have any outlier layer are involved in model aggregation. We have performed extensive experiments on diverse scenarios and shown that the proposed approach provides a robust defense against different attacks. To test the defense capability of the ARFED in different conditions, we considered label flipping, Byzantine, and partial knowledge attacks for both IID and Non-IID settings in our experimental evaluations. Moreover, we proposed a new attack, called organized partial knowledge attack, where malicious participants use their training statistics collaboratively to define a common poisoned model. We have shown that organized partial knowledge attacks are more effective than independent attacks.

翻译：在联邦学习中，每个参与者使用自身数据训练本地模型，全局模型通过聚合这些参与者上传的模型更新，在可信服务器上形成。由于服务器无法干预或监督参与者的训练过程以保障隐私，全局模型易受到数据投毒和模型投毒等攻击。尽管近期提出了诸多防御算法应对此类攻击，但它们往往做出与联邦学习特性不符的强假设（如假设数据呈非独立同分布），且多数缺乏全面的实验分析。本文提出一种名为ARFED的防御算法，该算法不对数据分布、参与者更新相似度或恶意参与者比例做任何假设。ARFED主要依据各参与者模型更新与全局模型距离的离群状态，对模型架构的每一层进行异常判定，仅允许不存在任何异常层的参与者参与模型聚合。我们在多种场景下开展大量实验，证明该方法能对不同攻击提供稳健防御。为测试ARFED在不同条件下的防御能力，我们在实验评估中分别针对独立同分布与非独立同分布设置，考虑了标签翻转、拜占庭攻击和部分知识攻击。此外，我们提出一种名为"协同部分知识攻击"的新型攻击方式，恶意参与者通过协作利用自身训练统计数据来定义统一的投毒模型。实验表明，协同部分知识攻击比独立攻击更具破坏性。