Federated Learning is a promising approach for training machine learning models while preserving data privacy, but its distributed nature makes it vulnerable to backdoor attacks, particularly in NLP tasks while related research remains limited. This paper introduces SDBA, a novel backdoor attack mechanism designed for NLP tasks in FL environments. Our systematic analysis across LSTM and GPT-2 models identifies the most vulnerable layers for backdoor injection and achieves both stealth and long-lasting durability through layer-wise gradient masking and top-k% gradient masking within these layers. Experiments on next token prediction and sentiment analysis tasks show that SDBA outperforms existing backdoors in durability and effectively bypasses representative defense mechanisms, with notable performance in LLM such as GPT-2. These results underscore the need for robust defense strategies in NLP-based FL systems.
翻译:联邦学习是一种在保护数据隐私的同时训练机器学习模型的有前景的方法,但其分布式特性使其容易受到后门攻击,尤其是在自然语言处理任务中,而相关研究仍然有限。本文介绍了SDBA,一种专为联邦学习环境中的自然语言处理任务设计的新型后门攻击机制。我们在LSTM和GPT-2模型上的系统分析确定了后门注入最脆弱的层,并通过在这些层内实施分层梯度掩码和top-k%梯度掩码,实现了隐蔽性和长期持久性。在下个词预测和情感分析任务上的实验表明,SDBA在持久性方面优于现有后门攻击,并能有效绕过代表性防御机制,在GPT-2等大型语言模型中表现尤为突出。这些结果强调了基于自然语言处理的联邦学习系统需要鲁棒的防御策略。