Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.
翻译:联邦学习已被证明易受后门攻击。然而,现有关于联邦学习后门攻击的学术研究依赖于高比例的、持有主任务相关数据的真实客户端,这在实际中并不现实。在真实工业场景下,即便是最简单的防御也足以抵御当前最先进的攻击——3DFed。实用化的联邦学习后门攻击仍处于发展初期。为填补这一空白,我们提出了DarkFed。首先,通过模拟一系列虚假客户端,我们实现了学术研究场景中典型的攻击者比例。由于这些模拟的虚假客户端缺乏真实训练数据,我们进一步提出了一种数据无关的后门攻击联邦学习方法。具体而言,我们深入探究了使用影子数据集注入后门的可行性。研究表明,即使影子数据集与主任务数据集存在显著差距,甚至使用不含任何语义信息的合成数据作为影子数据集,仍能实现显著的攻击性能。随后,我们以优化方式策略性地构建一系列隐蔽后门更新,模拟良性更新的特性,以规避防御机制检测。大量实证证据验证了DarkFed的实际有效性。