SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Recent AI systems combine large language models with tools, external knowledge via retrieval-augmented generation (RAG), and even autonomous multi-agent decision loops. This agentic AI paradigm greatly expands capabilities - but also vastly enlarges the attack surface. In this systematization, we map out the trust boundaries and security risks of agentic LLM-based systems. We develop a comprehensive taxonomy of attacks spanning prompt-level injections, knowledge-base poisoning, tool/plug-in exploits, and multi-agent emergent threats. Through a detailed literature review, we synthesize evidence from 2023-2025, including more than 20 peer-reviewed and archival studies, industry reports, and standards. We find that agentic systems introduce new vectors for indirect prompt injection, code execution exploits, RAG index poisoning, and cross-agent manipulation that go beyond traditional AI threats. We define attacker models and threat scenarios, and propose metrics (e.g., Unsafe Action Rate, Privilege Escalation Distance) to evaluate security posture. Our survey examines defenses such as input sanitization, retrieval filters, sandboxes, access control, and "AI guardrails," assessing their effectiveness and pointing out the areas where protection is still lacking. To assist practitioners, we outline defensive controls and provide a phased security checklist for deploying agentic AI (covering design-time hardening, runtime monitoring, and incident response). Finally, we outline open research challenges in secure autonomous AI (robust tool APIs, verifiable agent behavior, supply-chain safeguards) and discuss ethical and responsible disclosure practices. We systematize recent findings to help researchers and engineers understand and mitigate security risks in agentic AI.

翻译：近期的人工智能系统将大型语言模型与工具、通过检索增强生成的外部知识，甚至自主多智能体决策循环相结合。这种智能体AI范式极大地扩展了能力，但也显著扩大了攻击面。在本系统化研究中，我们描绘了基于智能体LLM系统的信任边界和安全风险。我们开发了一个全面的攻击分类体系，涵盖提示层面注入、知识库投毒、工具/插件利用以及多智能体涌现威胁。通过详细的文献综述，我们综合了2023-2025年的证据，包括20余篇经同行评审的档案研究、行业报告和标准。我们发现，智能体系统引入了超越传统AI威胁的新型向量，包括间接提示注入、代码执行利用、RAG索引投毒以及跨智能体操控。我们定义了攻击者模型和威胁场景，并提出了评估安全态势的指标（如不安全行为率、权限提升距离）。我们的调查审视了输入净化、检索过滤器、沙箱、访问控制和“AI护栏”等防御措施，评估其有效性并指出防护仍不足的领域。为帮助实践者，我们概述了防御控制措施，并提供了部署智能体AI的分阶段安全清单（涵盖设计时加固、运行时监控和事件响应）。最后，我们概述了安全自主AI领域的开放研究挑战（稳健工具API、可验证智能体行为、供应链保障）并讨论了伦理和负责任的披露实践。我们系统化近期的研究成果，以帮助研究人员和工程师理解并缓解智能体AI中的安全风险。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

伯克利最新《智能体 AI (Agentic AI)》课程

专知会员服务

49+阅读 · 3月1日

智能体化人工智能 (Agentic AI) 的前行之路：挑战与机遇

专知会员服务

43+阅读 · 1月8日

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

专知会员服务

35+阅读 · 2025年12月28日

基于大语言模型的智能体易产生幻觉：分类体系、方法与未来方向综述

专知会员服务

32+阅读 · 2025年9月27日