A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

The Large Language Models (LLMs) are poised to offer efficient and intelligent services for future mobile communication networks, owing to their exceptional capabilities in language comprehension and generation. However, the extremely high data and computational resource requirements for the performance of LLMs compel developers to resort to outsourcing training or utilizing third-party data and computing resources. These strategies may expose the model within the network to maliciously manipulated training data and processing, providing an opportunity for attackers to embed a hidden backdoor into the model, termed a backdoor attack. Backdoor attack in LLMs refers to embedding a hidden backdoor in LLMs that causes the model to perform normally on benign samples but exhibit degraded performance on poisoned ones. This issue is particularly concerning within communication networks where reliability and security are paramount. Despite the extensive research on backdoor attacks, there remains a lack of in-depth exploration specifically within the context of LLMs employed in communication networks, and a systematic review of such attacks is currently absent. In this survey, we systematically propose a taxonomy of backdoor attacks in LLMs as used in communication networks, dividing them into four major categories: input-triggered, prompt-triggered, instruction-triggered, and demonstration-triggered attacks. Furthermore, we conduct a comprehensive analysis of the benchmark datasets. Finally, we identify potential problems and open challenges, offering valuable insights into future research directions for enhancing the security and integrity of LLMs in communication networks.

翻译：大型语言模型凭借其卓越的语言理解与生成能力，有望为未来移动通信网络提供高效智能的服务。然而，其性能对海量数据和计算资源的需求，迫使开发者依赖外包训练或使用第三方数据与计算资源。这些策略可能导致网络中的模型遭受恶意篡改的训练数据处理，为攻击者植入隐藏后门创造了契机，此类攻击即称为后门攻击。大型语言模型中的后门攻击指在模型中植入隐藏后门，使模型在良性样本上表现正常，但面对受污染样本时性能显著退化。在可靠性及安全性至关重要的通信网络中，该问题尤为值得关注。尽管针对后门攻击已有广泛研究，但专门针对通信网络中大型语言模型后门攻击的深入探索仍显不足，目前尚缺乏系统性的综述。本调研首次系统性地提出通信网络中大型语言模型后门攻击的分类体系，将其划分为四大类别：输入触发型、提示触发型、指令触发型与示例触发型攻击。此外，我们全面分析了基准数据集，并识别了潜在问题与开放挑战，为提升通信网络大型语言模型安全性与完整性的未来研究方向提供了宝贵见解。