Coward：基于碰撞的水印用于主动式联邦后门检测 (Coward: Collision-based Watermark for Proactive Federated Backdoor Detection)

Backdoor detection is currently the mainstream defense against backdoor attacks in federated learning (FL), where a small number of malicious clients can upload poisoned updates to compromise the federated global model. Existing backdoor detection techniques fall into two categories, passive and proactive, depending on whether the server proactively intervenes in the training process. However, both of them have inherent limitations in practice: passive detection methods are disrupted by common non-i.i.d. data distributions and random participation of FL clients, whereas current proactive detection methods are misled by an inevitable out-of-distribution (OOD) bias because they rely on backdoor coexistence effects. To address these issues, we introduce a novel proactive detection method dubbed Coward, inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones. Correspondingly, we modify the federated global model by injecting a carefully designed backdoor-collided watermark, implemented via regulated dual-mapping learning on OOD data. This design not only enables an inverted detection paradigm compared to existing proactive methods, thereby naturally counteracting the adverse impact of OOD prediction bias, but also introduces a low-disruptive training intervention that inherently limits the strength of OOD bias, leading to significantly fewer misjudgments. Extensive experiments on benchmark datasets show that Coward achieves state-of-the-art detection performance, effectively alleviates OOD prediction bias, and remains robust against potential adaptive attacks. The code for our method is available at https://github.com/still2009/cowardFL.

翻译：后门检测是目前联邦学习（FL）中防御后门攻击的主流方法，其中少数恶意客户端可能上传被投毒的更新以破坏联邦全局模型。现有的后门检测技术根据服务器是否主动干预训练过程，可分为被动式与主动式两类。然而，这两类方法在实践中均存在固有局限：被动检测方法受到常见的非独立同分布数据分布和联邦客户端随机参与的干扰，而当前的主动检测方法则因依赖于后门共存效应，被不可避免的分布外（OOD）偏差所误导。为解决这些问题，我们提出了一种新颖的主动检测方法，命名为Coward，其灵感来源于我们对多后门碰撞效应的发现——即连续植入的不同后门会显著抑制较早的后门。相应地，我们通过注入一个精心设计的后门碰撞水印来修改联邦全局模型，该水印通过在OOD数据上进行受调控的双映射学习来实现。这一设计不仅实现了与现有主动方法相反的检测范式，从而自然地抵消了OOD预测偏差的不利影响，还引入了一种低干扰的训练干预，其本身限制了OOD偏差的强度，从而显著减少了误判。在基准数据集上的大量实验表明，Coward实现了最先进的检测性能，有效缓解了OOD预测偏差，并对潜在的自适应攻击保持鲁棒性。本方法的代码可在 https://github.com/still2009/cowardFL 获取。