Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on each client's local LoRA updates, treating them as high-dimensional behavioral features and using a lightweight classifier to determine whether they are malicious. Extensive experiments demonstrate that Safe-FedLLM effectively improves FedLLM's robustness against malicious clients while maintaining competitive performance on benign data. Notably, our method effectively suppresses the impact of malicious data without significantly affecting training speed, and remains effective even under high malicious client ratios.
翻译:联邦学习(FL)解决了大语言模型(LLM)训练中的隐私与数据孤岛问题。现有研究大多聚焦于提升LLM联邦学习(FedLLM)的效率,然而开放联邦环境中的安全性问题,特别是针对恶意客户端的防御机制,仍鲜有涉足。为探究FedLLM的安全性,我们开展了一项先导研究,从LoRA更新的视角分析潜在攻击面与防御特征。研究发现FedLLM具有两个关键属性:1)LLM易受FL中恶意客户端的攻击;2)LoRA更新展现出可通过轻量级分类器有效区分的独特行为模式。基于这些特性,我们提出Safe-FedLLM——一种面向FedLLM的探针式防御框架,该框架构建了三层防御体系:步骤级、客户端级和影子级。Safe-FedLLM的核心思想是对每个客户端的本地LoRA更新进行探针式判别,将其视为高维行为特征,并使用轻量级分类器判定其是否恶意。大量实验表明,Safe-FedLLM在保持良性数据竞争性性能的同时,能有效提升FedLLM抵御恶意客户端的鲁棒性。尤为重要的是,本方法在显著抑制恶意数据影响的同时不显著影响训练速度,即便在高比例恶意客户端环境下仍保持有效性。