G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering

As a collaborative paradigm, Federated Learning (FL) empowers clients to engage in collective model training without exchanging their respective local data. Nevertheless, FL remains vulnerable to backdoor attacks in which an attacker compromises malicious clients, and injects poisoned model weights into the aggregation process to yield attacker-chosen predictions for particular samples. Existing countermeasures, mainly based on anomaly detection, may erroneously reject legitimate weights while accepting malicious ones, which is due to inadequacies in quantifying client model similarities. Other defense mechanisms prove effective exclusively when confronted with a restricted number of malicious clients, e.g., less than 10%. To address these vulnerabilities, we present G$^2$uardFL, a protective framework that reframes the detection of malicious clients as an attributed graph clustering problem, thereby safeguarding FL systems. This framework employs a client graph clustering technique to identify malicious clients and incorporates an adaptive method to amplify the disparity between the aggregated model and poisoned client models, thereby eliminating previously embedded backdoors. A theoretical analysis of convergence is also performed to demonstrate that the global model closely approximates the model untouched by any backdoor. Through empirical evaluation compared to cutting-edge defenses and against various backdoor attacks, our experimental results indicate that G$^2$uardFL considerably undermines the effectiveness of backdoor attacks while maintaining a negligible impact on the benign sample performance.

翻译：作为一种协同范式，联邦学习（FL）允许客户端在不交换各自本地数据的情况下参与集体模型训练。然而，联邦学习仍然容易受到后门攻击，攻击者通过入侵恶意客户端，在聚合过程中注入被篡改的模型权重，从而对特定样本产生攻击者预设的预测结果。现有的防御措施主要基于异常检测，但由于量化客户端模型相似性的不足，可能错误地拒绝合法权重而接受恶意权重。其他防御机制仅在恶意客户端数量有限（例如低于10%）时有效。为解决这些漏洞，我们提出了G$^2$uardFL，这是一种将恶意客户端检测重构为属性图聚类问题的保护框架，从而保障联邦学习系统的安全。该框架采用客户端图聚类技术识别恶意客户端，并结合自适应方法放大聚合模型与中毒客户端模型之间的差异，进而消除先前嵌入的后门。本文还进行了收敛性理论分析，证明全局模型能够紧密逼近未受任何后门污染的模型。通过对抗多种后门攻击的前沿防御方案进行实证评估，实验结果表明，G$^2$uardFL在显著削弱后门攻击有效性的同时，对良性样本性能的影响可忽略不计。