Targeted model poisoning attacks pose a significant threat to federated learning systems. Recent studies show that edge-case targeted attacks, which target a small fraction of the input space are nearly impossible to counter using existing fixed defense strategies. In this paper, we strive to design a learned-defense strategy against such attacks, using a small defense dataset. The defense dataset can be collected by the central authority of the federated learning task, and should contain a mix of poisoned and clean examples. The proposed framework, LearnDefend, estimates the probability of a client update being malicious. The examples in defense dataset need not be pre-marked as poisoned or clean. We also learn a poisoned data detector model which can be used to mark each example in the defense dataset as clean or poisoned. We estimate the poisoned data detector and the client importance models in a coupled optimization approach. Our experiments demonstrate that LearnDefend is capable of defending against state-of-the-art attacks where existing fixed defense strategies fail. We also show that LearnDefend is robust to size and noise in the marking of clean examples in the defense dataset.
翻译:针对性模型投毒攻击对联邦学习系统构成了重大威胁。近期研究表明,针对输入空间小部分区域的边缘案例定向攻击,几乎无法通过现有固定防御策略加以应对。本文致力于设计一种利用小型防御数据集的习得式防御策略来抵御此类攻击。该防御数据集可由联邦学习任务的中央机构收集,且应同时包含中毒样本和干净样本。所提出的LearnDefend框架可评估客户端更新的恶意概率。防御数据集中的样本无需预先标记为中毒或干净。我们还学习了一个中毒数据检测器模型,该模型可用于将防御数据集中的每个样本标记为干净或中毒。我们通过耦合优化方法联合估计中毒数据检测器与客户端重要性模型。实验表明,LearnDefend能够有效防御现有固定防御策略无法应对的最先进攻击。我们还证明了LearnDefend对防御数据集中干净样本标记的规模与噪声具有鲁棒性。