Federated learning is gaining popularity as it enables training high-utility models across several clients without directly sharing their private data. As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients. Despite the theoretical and empirical success in defending against attacks that aim to degrade models' utility, defense against backdoor attacks that increase model accuracy on backdoor samples exclusively without hurting the utility on other samples remains challenging. To this end, we first analyze the vulnerability of federated learning to backdoor attacks over a flat loss landscape which is common for well-designed neural networks such as Resnet [He et al., 2015] but is often overlooked by previous works. Over a flat loss landscape, misleading federated learning models to exclusively benefit malicious clients with backdoor samples do not require a significant difference between malicious and benign client-wise updates, making existing defenses insufficient. In contrast, we propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful via selectively masking out the gradient elements that favor few and possibly malicious clients regardless of the difference magnitude. Theoretical results suggest that our approach provably mitigates backdoor attacks over both flat and sharp loss landscapes. Empirical results on three datasets with different modalities and varying numbers of clients further demonstrate that our approach mitigates a broad class of backdoor attacks with a negligible cost on the model utility.
翻译:联邦学习因能在不直接共享私有数据的情况下,跨多个客户端训练高效用模型而日益普及。然而,联邦设置的一个缺点是,在存在恶意客户端时,模型易受各种对抗攻击。尽管在防御旨在降低模型效用的攻击方面取得了理论和实证成功,但防御仅增加后门样本上的模型准确性而同时不影响其他样本效用的后门攻击仍具挑战性。为此,我们首先分析了联邦学习在平坦损失景观上对后门攻击的脆弱性——这种平坦景观常见于ResNet [He et al., 2015]等精心设计的神经网络,但此前研究常忽略此特性。在平坦损失景观上,误导联邦学习模型以仅使携带后门样本的恶意客户端受益,并不需要恶意与良性客户端更新之间存在显著差异,导致现有防御措施不足。相反,我们提出了一种不变聚合器,通过选择性屏蔽仅有利于少数(可能为恶意)客户端的梯度元素(无论差异大小),将聚合更新重定向至通常有用的不变方向。理论结果表明,我们的方法可证明地缓解平坦和尖锐损失景观上的后门攻击。在三个不同模态和不同客户端数量的数据集上的实证结果进一步表明,我们的方法能以可忽略的模型效用成本缓解广泛类别的后门攻击。