DarkFed: A Data-Free Backdoor Attack in Federated Learning

Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor attack remains in a nascent stage of development. To bridge this gap, we present DarkFed. Initially, we emulate a series of fake clients, thereby achieving the attacker proportion typical of academic research scenarios. Given that these emulated fake clients lack genuine training data, we further propose a data-free approach to backdoor FL. Specifically, we delve into the feasibility of injecting a backdoor using a shadow dataset. Our exploration reveals that impressive attack performance can be achieved, even when there is a substantial gap between the shadow dataset and the main task dataset. This holds true even when employing synthetic data devoid of any semantic information as the shadow dataset. Subsequently, we strategically construct a series of covert backdoor updates in an optimized manner, mimicking the properties of benign updates, to evade detection by defenses. A substantial body of empirical evidence validates the tangible effectiveness of DarkFed.

翻译：联邦学习已被证明易受后门攻击。然而，现有关于联邦学习后门攻击的学术研究依赖于高比例的、持有主任务相关数据的真实客户端，这在实际中并不现实。在真实工业场景下，即便是最简单的防御也足以抵御当前最先进的攻击——3DFed。实用化的联邦学习后门攻击仍处于发展初期。为填补这一空白，我们提出了DarkFed。首先，通过模拟一系列虚假客户端，我们实现了学术研究场景中典型的攻击者比例。由于这些模拟的虚假客户端缺乏真实训练数据，我们进一步提出了一种数据无关的后门攻击联邦学习方法。具体而言，我们深入探究了使用影子数据集注入后门的可行性。研究表明，即使影子数据集与主任务数据集存在显著差距，甚至使用不含任何语义信息的合成数据作为影子数据集，仍能实现显著的攻击性能。随后，我们以优化方式策略性地构建一系列隐蔽后门更新，模拟良性更新的特性，以规避防御机制检测。大量实证证据验证了DarkFed的实际有效性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日