With the continuous development of large language models (LLMs), transformer-based models have made groundbreaking advances in numerous natural language processing (NLP) tasks, leading to the emergence of a series of agents that use LLMs as their control hub. While LLMs have achieved success in various tasks, they face numerous security and privacy threats, which become even more severe in the agent scenarios. To enhance the reliability of LLM-based applications, a range of research has emerged to assess and mitigate these risks from different perspectives. To help researchers gain a comprehensive understanding of various risks, this survey collects and analyzes the different threats faced by these agents. To address the challenges posed by previous taxonomies in handling cross-module and cross-stage threats, we propose a novel taxonomy framework based on the sources and impacts. Additionally, we identify six key features of LLM-based agents, based on which we summarize the current research progress and analyze their limitations. Subsequently, we select four representative agents as case studies to analyze the risks they may face in practical use. Finally, based on the aforementioned analyses, we propose future research directions from the perspectives of data, methodology, and policy, respectively.
翻译:随着大型语言模型(LLMs)的持续发展,基于Transformer的模型在众多自然语言处理(NLP)任务中取得了突破性进展,催生了一系列以LLM作为控制核心的智能体。尽管LLMs在各种任务中取得了成功,但它们面临着众多的安全与隐私威胁,这些威胁在智能体应用场景中变得尤为严峻。为了提高基于LLM的应用的可靠性,一系列研究从不同角度涌现,旨在评估和缓解这些风险。为了帮助研究者全面理解各类风险,本综述收集并分析了这些智能体所面临的不同威胁。针对以往分类法在处理跨模块、跨阶段威胁时存在的挑战,我们提出了一种基于威胁来源与影响的新型分类框架。此外,我们识别了基于LLM的智能体的六个关键特征,并在此基础上总结了当前的研究进展,分析了其局限性。随后,我们选取了四个代表性智能体作为案例研究,分析了它们在实际应用中可能面临的风险。最后,基于上述分析,我们分别从数据、方法论和政策三个角度提出了未来的研究方向。