Federated learning (FL) naturally faces the problem of data heterogeneity in real-world scenarios, but this is often overlooked by studies on FL security and privacy. On the one hand, the effectiveness of backdoor attacks on FL may drop significantly under non-IID scenarios. On the other hand, malicious clients may steal private data through privacy inference attacks. Therefore, it is necessary to have a comprehensive perspective of data heterogeneity, backdoor, and privacy inference. In this paper, we propose a novel privacy inference-empowered stealthy backdoor attack (PI-SBA) scheme for FL under non-IID scenarios. Firstly, a diverse data reconstruction mechanism based on generative adversarial networks (GANs) is proposed to produce a supplementary dataset, which can improve the attacker's local data distribution and support more sophisticated strategies for backdoor attacks. Based on this, we design a source-specified backdoor learning (SSBL) strategy as a demonstration, allowing the adversary to arbitrarily specify which classes are susceptible to the backdoor trigger. Since the PI-SBA has an independent poisoned data synthesis process, it can be integrated into existing backdoor attacks to improve their effectiveness and stealthiness in non-IID scenarios. Extensive experiments based on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that the proposed PI-SBA scheme is effective in non-IID FL and stealthy against state-of-the-art defense methods.
翻译:联邦学习(FL)在实际场景中天然面临数据异构性问题,然而这一特性在FL安全与隐私相关研究中常被忽视。一方面,在非独立同分布(non-IID)场景下,针对FL的后门攻击有效性可能显著下降;另一方面,恶意客户端可能通过隐私推断攻击窃取私有数据。因此,亟需从数据异构性、后门攻击与隐私推断的综合视角展开研究。本文提出一种新颖的基于隐私推断的隐蔽后门攻击方案(PI-SBA),适用于non-IID场景下的FL。首先,设计了基于生成对抗网络(GANs)的多样化数据重构机制,用于生成补充数据集,该机制可改善攻击者的本地数据分布,并支撑更复杂的后门攻击策略。基于此,我们设计了源指定后门学习(SSBL)策略作为示例,使攻击者能够任意指定易受后门触发器影响的类别。由于PI-SBA具有独立的恶意数据合成流程,可被集成至现有后门攻击中,以增强其在non-IID场景下的有效性与隐蔽性。基于MNIST、CIFAR10和Youtube Aligned Face数据集的广泛实验表明,所提出的PI-SBA方案在non-IID FL场景下具有有效性,且能隐蔽地规避当前最优防御方法。