A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

The Large Language Models (LLMs) are becoming an integral part of modern communication networks due to their superior proficiency in language comprehension and generation. In the context of these networks, where limited data and computing resources often necessitate the use of third-party data and computing resources, the risk of backdoor attacks becomes highly significant. Such strategies may expose the model within the network to maliciously manipulated training data and processing, providing an opportunity for attackers to embed a hidden backdoor into the model, termed a backdoor attack. Backdoor attack in LLMs refers to embedding a hidden backdoor in LLMs that causes the model to perform normally on benign samples but exhibit degraded performance on poisoned ones. This issue is particularly concerning within communication networks where reliability and security are paramount. Despite the extensive research on backdoor attacks, there remains a lack of in-depth exploration specifically within the context of LLMs employed in communication networks, and a systematic review of such attacks is currently absent. In this survey, we systematically propose a taxonomy of backdoor attacks in LLMs as used in communication networks, dividing them into four major categories: input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered attacks. Furthermore, we conduct a comprehensive analysis of the benchmark datasets within the network domain. Finally, we identify potential problems and open challenges, offering valuable insights into future research directions for enhancing the security and integrity of LLMs in communication networks.

翻译：大型语言模型（LLMs）因其在语言理解与生成方面的卓越能力，正逐渐成为现代通信网络不可或缺的组成部分。在此类网络中，由于有限的数据与计算资源常需借助第三方数据及计算资源，后门攻击的风险变得极为显著。此类策略可使网络中的模型暴露于恶意操控的训练数据与处理流程中，为攻击者向模型植入隐藏后门提供可乘之机，这种攻击方式即称为后门攻击。LLMs中的后门攻击是指在LLMs中嵌入隐蔽后门，使模型在良性样本上表现正常，但面对中毒样本时性能下降。该问题在可靠性及安全性至关重要的通信网络中尤为令人担忧。尽管后门攻击已有广泛研究，但在通信网络所采用的LLMs特定背景下仍缺乏深入探讨，且目前尚缺乏对此类攻击的系统性综述。在本综述中，我们系统地提出了通信网络中LLMs后门攻击的分类体系，将其划分为四大类：输入触发型、提示触发型、指令触发型与示例触发型攻击。此外，我们对网络领域的基准数据集进行了全面分析。最后，我们识别出潜在问题与开放挑战，为增强通信网络中LLMs安全性与完整性的未来研究方向提供了宝贵见解。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日