This paper proposes a novel, data-agnostic, model poisoning attack on Federated Learning (FL), by designing a new adversarial graph autoencoder (GAE)-based framework. The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability. By listening to the benign local models and the global model, the attacker extracts the graph structural correlations among the benign local models and the training data features substantiating the models. The attacker then adversarially regenerates the graph structural correlations while maximizing the FL training loss, and subsequently generates malicious local models using the adversarial graph structure and the training data features of the benign ones. A new algorithm is designed to iteratively train the malicious local models using GAE and sub-gradient descent. The convergence of FL under attack is rigorously proved, with a considerably large optimality gap. Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it. The attack can give rise to an infection across all benign devices, making it a serious threat to FL.
翻译:本文提出了一种新颖的、数据不可知的模型投毒攻击联邦学习(FL)方法,通过设计一种基于对抗性图自编码器(GAE)的框架。该攻击无需了解FL训练数据,即可实现有效性与不可检测性。攻击者通过监听良性本地模型及全局模型,提取良性本地模型间的图结构相关性以及构成这些模型的训练数据特征。随后,攻击者在最大化FL训练损失的同时对抗性地重构图结构相关性,并利用对抗性图结构与良性模型的训练数据特征生成恶意本地模型。本文设计了一种新算法,通过GAE与次梯度下降迭代训练恶意本地模型。严格证明了受攻击下FL的收敛性,且存在显著较大的最优性间隙。实验表明,所提攻击下FL准确率逐步下降,而现有防御机制无法检测该攻击。该攻击可导致所有良性设备被感染,对FL构成严重威胁。