Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.
翻译:中毒攻击会破坏联邦学习(FL)的训练阶段,导致学习得到的全局模型对攻击者选定的输入(称为目标输入)产生错误分类。现有防御方法主要侧重于保护联邦学习的训练阶段,以确保学习得到的全局模型免受中毒影响。然而,正如我们实验所证实的,当客户端的本地训练数据高度非独立同分布或恶意客户端数量较多时,这些防御措施往往效果有限。在本研究中,我们提出了首个面向联邦学习的中毒溯源方法FLForensics。FLForensics是对现有训练阶段防御手段的补充。具体而言,当训练阶段防御失效且已部署中毒的全局模型时,FLForensics旨在识别出错误分类的目标输入后,追溯执行中毒攻击的恶意客户端。我们从理论上证明,在形式化定义的中毒攻击条件下,FLForensics能够准确区分良性客户端与恶意客户端。此外,我们在五个基准数据集上的实验表明,FLForensics能有效追溯现有及自适应中毒攻击的实施者。