Safe-FedLLM: Delving into the Safety of Federated Large Language Models - 专知论文

会员服务 ·

0

LoRA · 语言模型 · 攻击 · 分类器 · 探针 ·

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

翻译：Safe-FedLLM：面向联邦大语言模型的安全性探究

Mingxiang Tao,Yu Tian,Wenxuan Tu,Yue Yang,Xue Yang,Xiangyan Tang

Federated learning (FL) addresses privacy and data-silo issues in the training of large language models (LLMs). Most prior work focuses on improving the efficiency of federated learning for LLMs (FedLLM). However, security in open federated environments, particularly defenses against malicious clients, remains underexplored. To investigate the security of FedLLM, we conduct a preliminary study to analyze potential attack surfaces and defensive characteristics from the perspective of LoRA updates. We find two key properties of FedLLM: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA updates exhibit distinct behavioral patterns that can be effectively distinguished by lightweight classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for FedLLM, which constructs defenses across three levels: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on each client's local LoRA updates, treating them as high-dimensional behavioral features and using a lightweight classifier to determine whether they are malicious. Extensive experiments demonstrate that Safe-FedLLM effectively improves FedLLM's robustness against malicious clients while maintaining competitive performance on benign data. Notably, our method effectively suppresses the impact of malicious data without significantly affecting training speed, and remains effective even under high malicious client ratios.

翻译：联邦学习（FL）解决了大语言模型（LLM）训练中的隐私与数据孤岛问题。现有研究大多聚焦于提升LLM联邦学习（FedLLM）的效率，然而开放联邦环境中的安全性问题，特别是针对恶意客户端的防御机制，仍鲜有涉足。为探究FedLLM的安全性，我们开展了一项先导研究，从LoRA更新的视角分析潜在攻击面与防御特征。研究发现FedLLM具有两个关键属性：1）LLM易受FL中恶意客户端的攻击；2）LoRA更新展现出可通过轻量级分类器有效区分的独特行为模式。基于这些特性，我们提出Safe-FedLLM——一种面向FedLLM的探针式防御框架，该框架构建了三层防御体系：步骤级、客户端级和影子级。Safe-FedLLM的核心思想是对每个客户端的本地LoRA更新进行探针式判别，将其视为高维行为特征，并使用轻量级分类器判定其是否恶意。大量实验表明，Safe-FedLLM在保持良性数据竞争性性能的同时，能有效提升FedLLM抵御恶意客户端的鲁棒性。尤为重要的是，本方法在显著抑制恶意数据影响的同时不显著影响训练速度，即便在高比例恶意客户端环境下仍保持有效性。

0

相关内容

LoRA

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

专知会员服务

20+阅读 · 2025年9月18日

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

专知会员服务

17+阅读 · 2025年5月22日

探索大型语言模型在网络安全中的作用：一项系统综述

探索大型语言模型在网络安全中的作用：一项系统综述

专知会员服务

22+阅读 · 2025年4月27日

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

专知会员服务

37+阅读 · 2025年2月4日

【新书】大规模语言模型的隐私与安全，

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日

大语言模型中的提示隐私保护

大语言模型中的提示隐私保护

专知会员服务

24+阅读 · 2024年7月24日

大型语言模型在国家安全应用中的使用

大型语言模型在国家安全应用中的使用

专知会员服务

57+阅读 · 2024年7月13日

大型语言模型网络安全综述

大型语言模型网络安全综述

专知会员服务

68+阅读 · 2024年5月12日

大模型如何应对安全性？清华等最新《大型语言模型系统的风险分类、缓解措施及评估基准》论文

大模型如何应对安全性？清华等最新《大型语言模型系统的风险分类、缓解措施及评估基准》论文

专知会员服务

49+阅读 · 2024年1月17日

联邦学习中的隐私和鲁棒性:攻击和防御, 杨强等学者最新综述论文，16页pdf

联邦学习中的隐私和鲁棒性:攻击和防御, 杨强等学者最新综述论文，16页pdf

专知会员服务

104+阅读 · 2021年2月3日

联邦学习如何处理异质性？港科大最新《异质联邦学习》综述，46页pdf全面阐述异质联邦学习的数据空间、统计、系统和模型异质性

联邦学习如何处理异质性？港科大最新《异质联邦学习》综述，46页pdf全面阐述异质联邦学习的数据空间、统计、系统和模型异质性

专知

11+阅读 · 2022年12月1日

【2022新书】联邦学习：方法和应用的综合概述，531页pdf

【2022新书】联邦学习：方法和应用的综合概述，531页pdf

专知

28+阅读 · 2022年7月14日

「联邦学习隐私保护」最新2022研究综述

「联邦学习隐私保护」最新2022研究综述

专知

16+阅读 · 2022年4月1日

联邦学习研究综述

联邦学习研究综述

专知

11+阅读 · 2021年12月25日

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

48+阅读 · 2020年12月2日

联邦学习安全与隐私保护研究综述

联邦学习安全与隐私保护研究综述

专知

12+阅读 · 2020年8月7日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知

20+阅读 · 2020年2月28日

【重磅】联邦学习FL进展与开放问题万字综述论文，58位学者25家机构联合出品，105页pdf438篇文献

【重磅】联邦学习FL进展与开放问题万字综述论文，58位学者25家机构联合出品，105页pdf438篇文献

专知

33+阅读 · 2019年12月15日

【联邦学习】联邦学习的研究及应用、联邦学习的多种可能性

【联邦学习】联邦学习的研究及应用、联邦学习的多种可能性

产业智能官

30+阅读 · 2019年5月24日

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

机器之心

15+阅读 · 2019年3月18日

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

面向网络系统的一致性安全隐私分析与防护机制设计

国家自然科学基金

2+阅读 · 2017年12月31日

复杂通信网络传输容量与级联效应安全防护综合研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

网络空间安全关键技术研究

国家自然科学基金

20+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

31+阅读 · 2015年12月31日

基于自适应模型检测的安全协议自动建模与设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

隐写模糊安全性测度及其优化嵌入算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

多语言大数据环境下的复杂网络行为分析、预测和干预

国家自然科学基金

4+阅读 · 2014年12月31日

FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model

Arxiv

0+阅读 · 5月19日

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Arxiv

0+阅读 · 4月29日

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Arxiv

0+阅读 · 4月26日

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Arxiv

0+阅读 · 4月14日

A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments

Arxiv

0+阅读 · 4月13日

Poisoning with A Pill: Circumventing Detection in Federated Learning

Arxiv

0+阅读 · 4月13日

Aergia: Leveraging Heterogeneity in Federated Learning Systems

Arxiv

0+阅读 · 3月18日

A Unified Language Model for Large Scale Search, Recommendation, and Reasoning

Arxiv

0+阅读 · 3月18日

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Arxiv

14+阅读 · 2024年1月11日

Evaluating Large Language Models: A Comprehensive Survey

Arxiv

16+阅读 · 2023年10月31日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

0+阅读 · 10分钟前

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

0+阅读 · 12分钟前

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

1+阅读 · 42分钟前

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

1+阅读 · 今天14:05

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

1+阅读 · 今天13:55

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

1+阅读 · 今天13:51

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

2+阅读 · 今天13:48

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

7+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

20+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

专知会员服务

20+阅读 · 2025年9月18日

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

专知会员服务

17+阅读 · 2025年5月22日

探索大型语言模型在网络安全中的作用：一项系统综述

探索大型语言模型在网络安全中的作用：一项系统综述

专知会员服务

22+阅读 · 2025年4月27日

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

专知会员服务

37+阅读 · 2025年2月4日

【新书】大规模语言模型的隐私与安全，

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日

大语言模型中的提示隐私保护

大语言模型中的提示隐私保护

专知会员服务

24+阅读 · 2024年7月24日

大型语言模型在国家安全应用中的使用

大型语言模型在国家安全应用中的使用

专知会员服务

57+阅读 · 2024年7月13日

大型语言模型网络安全综述

大型语言模型网络安全综述

专知会员服务

68+阅读 · 2024年5月12日

大模型如何应对安全性？清华等最新《大型语言模型系统的风险分类、缓解措施及评估基准》论文

大模型如何应对安全性？清华等最新《大型语言模型系统的风险分类、缓解措施及评估基准》论文

专知会员服务

49+阅读 · 2024年1月17日

联邦学习中的隐私和鲁棒性:攻击和防御, 杨强等学者最新综述论文，16页pdf

联邦学习中的隐私和鲁棒性:攻击和防御, 杨强等学者最新综述论文，16页pdf

专知会员服务

104+阅读 · 2021年2月3日

热门VIP内容

开通专知VIP会员享更多权益服务

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《伊朗与以色列-美国热战及其对数字技术的影响》

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

相关资讯

联邦学习如何处理异质性？港科大最新《异质联邦学习》综述，46页pdf全面阐述异质联邦学习的数据空间、统计、系统和模型异质性

联邦学习如何处理异质性？港科大最新《异质联邦学习》综述，46页pdf全面阐述异质联邦学习的数据空间、统计、系统和模型异质性

专知

11+阅读 · 2022年12月1日

【2022新书】联邦学习：方法和应用的综合概述，531页pdf

【2022新书】联邦学习：方法和应用的综合概述，531页pdf

专知

28+阅读 · 2022年7月14日

「联邦学习隐私保护」最新2022研究综述

「联邦学习隐私保护」最新2022研究综述

专知

16+阅读 · 2022年4月1日

联邦学习研究综述

联邦学习研究综述

专知

11+阅读 · 2021年12月25日

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

48+阅读 · 2020年12月2日

联邦学习安全与隐私保护研究综述

联邦学习安全与隐私保护研究综述

专知

12+阅读 · 2020年8月7日

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

【香港科技大学】联邦半监督学习综述，A Survey on Federated Semi-supervised Learning

专知

20+阅读 · 2020年2月28日

【重磅】联邦学习FL进展与开放问题万字综述论文，58位学者25家机构联合出品，105页pdf438篇文献

【重磅】联邦学习FL进展与开放问题万字综述论文，58位学者25家机构联合出品，105页pdf438篇文献

专知

33+阅读 · 2019年12月15日

【联邦学习】联邦学习的研究及应用、联邦学习的多种可能性

【联邦学习】联邦学习的研究及应用、联邦学习的多种可能性

产业智能官

30+阅读 · 2019年5月24日

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

BAM！利用知识蒸馏和多任务学习构建的通用语言模型

机器之心

15+阅读 · 2019年3月18日

相关论文

FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model

Arxiv

0+阅读 · 5月19日

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Arxiv

0+阅读 · 4月29日

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Arxiv

0+阅读 · 4月26日

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Arxiv

0+阅读 · 4月14日

A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments

Arxiv

0+阅读 · 4月13日

Poisoning with A Pill: Circumventing Detection in Federated Learning

Arxiv

0+阅读 · 4月13日

Aergia: Leveraging Heterogeneity in Federated Learning Systems

Arxiv

0+阅读 · 3月18日

A Unified Language Model for Large Scale Search, Recommendation, and Reasoning

Arxiv

0+阅读 · 3月18日

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Arxiv

14+阅读 · 2024年1月11日

Evaluating Large Language Models: A Comprehensive Survey

Arxiv

16+阅读 · 2023年10月31日

相关基金

移动互联网的用户隐私保护研究

国家自然科学基金

2+阅读 · 2017年12月31日

面向网络系统的一致性安全隐私分析与防护机制设计

国家自然科学基金

2+阅读 · 2017年12月31日

复杂通信网络传输容量与级联效应安全防护综合研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

网络空间安全关键技术研究

国家自然科学基金

20+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

31+阅读 · 2015年12月31日

基于自适应模型检测的安全协议自动建模与设计研究

国家自然科学基金

1+阅读 · 2014年12月31日

隐写模糊安全性测度及其优化嵌入算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

多语言大数据环境下的复杂网络行为分析、预测和干预

国家自然科学基金

4+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员