Federated Learning (FL) enables distributed model training but is vulnerable to backdoor attacks, where malicious clients embed attacker-controlled behaviors into the global model. Existing defenses fail against adaptive adversaries. In this paper, we present "Hammer and Anvil", a principled theoretical framework that categorizes backdoors by the deviation, $δ$, of their updates to the mean of the updates. We identify two fundamental defense types: "Type 1 (The Anvil)", comprising outlier detection and robust aggregation effective against large-deviation attacks, and "Type 2 (The Hammer)", consisting of removal-based defenses effective against small-deviation attacks. We demonstrate that defenses of a single type and non-principled combined defenses inherently leave an exploitable gap for adaptive attackers. To bridge this gap, we propose the principled combination of Type 1 and Type 2 defenses. We evaluate our framework against a new, worst-case, full-information adaptive adversary that knows the benign updates, the aggregation algorithm, and its parameters, and yet this adversary fails against our combined defenses. Our empirical evaluation across various datasets and settings shows that single-typed and non-principled combined defenses are easily broken, often by a single malicious client. In contrast, our best combined defense variants, $HA_{Flame}^{CSFT}$, $HA_{Krum}^{CSFT}$, and $HA_{Multi-Metrics}^{CSFT}$, remain undefeated even in the most adversarial settings. Our results provide a principled approach for research on backdoors in federated learning.
翻译:联邦学习(FL)实现了分布式模型训练,但易受后门攻击的威胁——恶意客户端将攻击者控制的行为嵌入全局模型。现有防御手段无法抵御自适应攻击者。本文提出"铁砧与重锤"(Hammer and Anvil)这一原则性理论框架,通过更新偏离均值偏差δ对后门进行分类。我们识别出两类基础防御:第一类(铁砧型)包含异常检测与鲁棒聚合,可有效应对大偏差攻击;第二类(重锤型)包含基于移除的防御,可有效应对小偏差攻击。我们证明:单一类型防御及非原则性组合防御本质上存在可供自适应攻击者利用的缺口。为弥补这一缺口,我们提出第一类与第二类防御的原则性组合。我们针对一种新型全信息自适应攻击者(知晓良性更新、聚合算法及其参数)评估该框架,结果表明该攻击者无法突破我们的组合防御。跨多种数据集与场景的实证显示,单一类型及非原则性组合防御极易被攻破(常仅需单个恶意客户端)。相比之下,我们最优的组合防御变体HA_{Flame}^{CSFT}、HA_{Krum}^{CSFT}和HA_{Multi-Metrics}^{CSFT}即使在最严苛的对抗场景中仍保持不败。本研究为联邦学习后门攻击的研究提供了原则性方法论。