EnCAgg: Enhanced Clustering Aggregation for Robust Federated Learning against Dynamic Model Poisoning

Federated learning faces increasing threats from model poisoning attacks, which harms its application to improve privacy. Existing defense methods typically rely on fixed thresholds or perform clustering with a fixed number of clusters to distinguish malicious gradients from benign ones. However, these methods are difficult to adapt to dynamic poisoning strategies of malicious clients, and often result in the loss of benign gradients due to the heterogeneity of clients' local datasets. To address these problems, we propose a novel robust aggregation method that leverages a small number of known benign clients as references, enabling accurate identification and filtering of malicious gradients while retaining as many benign gradients as possible, even when the number of malicious clients is unknown and variable. First, we introduce a density-based low-dimensional gradient clustering method, which projects gradients onto the two most divergent dimensions and applies density-based clustering to identify malicious gradients while retaining clustered benign gradients and potentially benign outliers. Second, we design an enhancing clustering low-dimensional gradient generator model, which learns to generate pseudo-gradients aligned with the boundary of the benign cluster. These pseudo-gradients act as bridges to connect sparse benign gradient outliers. Third, we introduce low-dimensional gradient re-clustering that clusters the generated pseudo-gradients together with real gradients to recover benign gradients misclassified as noise points, enabling more benign gradients to participate in aggregation. Extensive experiments on the MNIST, CIFAR-10, and MIND datasets demonstrate that our method exhibits superior fidelity and robustness under dynamic poisoning scenarios.

翻译：联邦学习面临日益严峻的模型投毒攻击威胁，这损害了其在隐私增强领域的应用。现有防御方法通常依赖固定阈值或采用固定聚类数量的聚类方法来区分恶意梯度与良性梯度。然而，这些方法难以适应恶意客户端的动态投毒策略，且常因客户端本地数据集的异构性导致良性梯度丢失。针对上述问题，我们提出一种新型鲁棒聚合方法，通过利用少量已知良性客户端作为参考，能够在恶意客户端数量未知且动态变化的情况下，准确识别并过滤恶意梯度，同时保留尽可能多的良性梯度。首先，我们引入基于密度的低维梯度聚类方法，将梯度投影至两个最发散维度，并应用密度聚类识别恶意梯度，同时保留聚类后的良性梯度及潜在良性异常点。其次，我们设计增强聚类低维梯度生成模型，学习生成与良性聚类边界对齐的伪梯度。这些伪梯度作为连接稀疏良性梯度异常点的桥梁。最后，我们引入低维梯度重聚类机制，将生成的伪梯度与真实梯度共同聚类，以恢复被误判为噪声点的良性梯度，使更多良性梯度参与聚合。在MNIST、CIFAR-10和MIND数据集上的大量实验表明，该方法在动态投毒场景下展现出优越的保真度和鲁棒性。