Backdoor attacks pose a significant threat to the integrity and reliability of Artificial Intelligence (AI) models, enabling adversaries to manipulate model behavior by injecting poisoned data with hidden triggers. These attacks can lead to severe consequences, especially in critical applications such as autonomous driving, healthcare, and finance. Detecting and mitigating backdoor attacks is crucial across the lifespan of model's phases, including pre-training, in-training, and post-training. In this paper, we propose Pre-Training Backdoor Mitigation for Federated Learning (FL-PBM), a novel defense mechanism that proactively filters poisoned data on the client side before model training in a federated learning (FL) environment. The approach consists of three stages: (1) inserting a benign trigger into the data to establish a controlled baseline, (2) applying Principal Component Analysis (PCA) to extract discriminative features and assess the separability of the data, (3) performing Gaussian Mixture Model (GMM) clustering to identify potentially malicious data samples based on their distribution in the PCA-transformed space, and (4) applying a targeted blurring technique to disrupt potential backdoor triggers. Together, these steps ensure that suspicious data is detected early and sanitized effectively, thereby minimizing the influence of backdoor triggers on the global model. Experimental evaluations on image-based datasets demonstrate that FL-PBM reduces attack success rates by up to 95% compared to baseline federated learning (FedAvg) and by 30 to 80% relative to state-of-the-art defenses (RDFL and LPSF). At the same time, it maintains over 90% clean model accuracy in most experiments, achieving better mitigation without degrading model performance.
翻译:后门攻击对人工智能(AI)模型的完整性与可靠性构成重大威胁,使攻击者能够通过注入携带隐藏触发器的恶意数据来操纵模型行为。此类攻击可能导致严重后果,尤其在自动驾驶、医疗健康和金融等关键应用领域。因此,在模型的预训练、训练中和训练后等全生命周期阶段,检测和缓解后门攻击至关重要。本文提出联邦学习中的预训练后门缓解方法(FL-PBM),这是一种新颖的防御机制,能够在联邦学习(FL)环境下,在模型训练之前主动过滤客户端侧的恶意数据。该方法包含三个阶段:(1)向数据中注入良性触发器以建立受控基线;(2)应用主成分分析(PCA)提取判别性特征并评估数据的可分性;(3)基于高斯混合模型(GMM)聚类,根据数据在PCA变换空间中的分布识别潜在的恶意数据样本;(4)采用针对性模糊技术破坏潜在的后门触发器。这些步骤协同作用,确保可疑数据被及早检测并有效净化,从而最大程度降低后门触发器对全局模型的影响。在图像数据集上的实验评估表明,与基线联邦学习(FedAvg)相比,FL-PBM可将攻击成功率降低高达95%,相较于最先进的防御方法(RDFL和LPSF)可降低30%至80%。同时,在大多数实验中,该方法能维持超过90%的干净模型准确率,在不降低模型性能的前提下实现更优的缓解效果。