Recent AI systems combine large language models with tools, external knowledge via retrieval-augmented generation (RAG), and even autonomous multi-agent decision loops. This agentic AI paradigm greatly expands capabilities - but also vastly enlarges the attack surface. In this systematization, we map out the trust boundaries and security risks of agentic LLM-based systems. We develop a comprehensive taxonomy of attacks spanning prompt-level injections, knowledge-base poisoning, tool/plug-in exploits, and multi-agent emergent threats. Through a detailed literature review, we synthesize evidence from 2023-2025, including more than 20 peer-reviewed and archival studies, industry reports, and standards. We find that agentic systems introduce new vectors for indirect prompt injection, code execution exploits, RAG index poisoning, and cross-agent manipulation that go beyond traditional AI threats. We define attacker models and threat scenarios, and propose metrics (e.g., Unsafe Action Rate, Privilege Escalation Distance) to evaluate security posture. Our survey examines defenses such as input sanitization, retrieval filters, sandboxes, access control, and "AI guardrails," assessing their effectiveness and pointing out the areas where protection is still lacking. To assist practitioners, we outline defensive controls and provide a phased security checklist for deploying agentic AI (covering design-time hardening, runtime monitoring, and incident response). Finally, we outline open research challenges in secure autonomous AI (robust tool APIs, verifiable agent behavior, supply-chain safeguards) and discuss ethical and responsible disclosure practices. We systematize recent findings to help researchers and engineers understand and mitigate security risks in agentic AI.
翻译:近期的人工智能系统将大型语言模型与工具、通过检索增强生成的外部知识,甚至自主多智能体决策循环相结合。这种智能体AI范式极大地扩展了能力,但也显著扩大了攻击面。在本系统化研究中,我们描绘了基于智能体LLM系统的信任边界和安全风险。我们开发了一个全面的攻击分类体系,涵盖提示层面注入、知识库投毒、工具/插件利用以及多智能体涌现威胁。通过详细的文献综述,我们综合了2023-2025年的证据,包括20余篇经同行评审的档案研究、行业报告和标准。我们发现,智能体系统引入了超越传统AI威胁的新型向量,包括间接提示注入、代码执行利用、RAG索引投毒以及跨智能体操控。我们定义了攻击者模型和威胁场景,并提出了评估安全态势的指标(如不安全行为率、权限提升距离)。我们的调查审视了输入净化、检索过滤器、沙箱、访问控制和“AI护栏”等防御措施,评估其有效性并指出防护仍不足的领域。为帮助实践者,我们概述了防御控制措施,并提供了部署智能体AI的分阶段安全清单(涵盖设计时加固、运行时监控和事件响应)。最后,我们概述了安全自主AI领域的开放研究挑战(稳健工具API、可验证智能体行为、供应链保障)并讨论了伦理和负责任的披露实践。我们系统化近期的研究成果,以帮助研究人员和工程师理解并缓解智能体AI中的安全风险。