Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.
翻译:投毒攻击会破坏联邦学习(FL)的训练阶段,导致学习到的全局模型对攻击者选定的输入(称为目标输入)产生错误分类。现有防御方法主要侧重于保护联邦学习的训练阶段,以确保学习到的全局模型免受投毒影响。然而,正如我们的实验所证实,当客户端的本地训练数据高度非独立同分布或恶意客户端数量较多时,这些防御措施往往效果有限。在本工作中,我们提出了首个面向联邦学习的投毒溯源方法FLForensics。FLForensics是对现有训练阶段防御手段的补充。具体而言,当训练阶段防御失效且已部署的全局模型遭受投毒时,FLForensics旨在识别出错误分类的目标输入后,追溯执行投毒攻击的恶意客户端。我们从理论上证明,在形式化定义的投毒攻击模型下,FLForensics能够准确区分良性客户端与恶意客户端。此外,我们在五个基准数据集上的实验表明,FLForensics对现有投毒攻击及自适应攻击均能实现有效的溯源。