Large Language Model (LLM)-based agent systems are increasingly deployed for complex real-world tasks but remain vulnerable to natural language-based attacks that exploit over-privileged tool use. This paper aims to understand and mitigate such attacks through the lens of privilege escalation, defined as agent actions exceeding the least privilege required for a user's intended task. Based on a formal model of LLM agent systems, we identify novel privilege escalation scenarios, particularly in multi-agent systems, including a variant akin to the classic confused deputy problem. To defend against both known and newly demonstrated privilege escalation, we propose SEAgent, a mandatory access control (MAC) framework built upon attribute-based access control (ABAC). SEAgent monitors agent-tool interactions via an information flow graph and enforces customizable security policies based on entity attributes. Our evaluations show that SEAgent effectively blocks various privilege escalation while maintaining a low false positive rate and negligible system overhead. This demonstrates its robustness and adaptability in securing LLM-based agent systems.
翻译:基于大语言模型(LLM)的智能体系统正日益广泛地部署于复杂的现实世界任务中,但其仍易受基于自然语言的攻击,这类攻击利用了工具使用的过度授权。本文旨在通过权限提升的视角来理解和缓解此类攻击,其中权限提升被定义为智能体行为超出了用户预期任务所需的最小权限。基于对LLM智能体系统的形式化建模,我们识别出新颖的权限提升场景,特别是在多智能体系统中,包括一种类似于经典“困惑代理”问题的变体。为了防御已知及新近演示的权限提升攻击,我们提出了SEAgent,这是一个构建在基于属性的访问控制(ABAC)之上的强制访问控制(MAC)框架。SEAgent通过信息流图监控智能体与工具的交互,并基于实体属性强制执行可定制的安全策略。我们的评估表明,SEAgent能有效阻断各类权限提升,同时保持较低的误报率和可忽略的系统开销。这证明了其在保障基于LLM的智能体系统安全方面的鲁棒性和适应性。