Equipping LLM agents with real-world tools can substantially improve productivity. However, granting agents autonomy over tool use also transfers the associated privileges to both the agent and the underlying LLM. Improper privilege usage may lead to serious consequences, including information leakage and infrastructure damage. While several benchmarks have been built to study agents' security, they often rely on pre-coded tools and restricted interaction patterns. Such crafted environments differ substantially from the real-world, making it hard to assess agents' security capabilities in critical privilege control and usage. Therefore, we propose GrantBox, a security evaluation sandbox for analyzing agent privilege usage. GrantBox automatically integrates real-world tools and allows LLM agents to invoke genuine privileges, enabling the evaluation of privilege usage under prompt injection attacks. Our results indicate that while LLMs exhibit basic security awareness and can block some direct attacks, they remain vulnerable to more sophisticated attacks, resulting in an average attack success rate of 84.80% in carefully crafted scenarios.
翻译:赋予大语言模型智能体使用真实工具的能力可以显著提升其生产力。然而,当智能体获得工具使用的自主权时,相关权限也会转移到智能体及其底层的大语言模型上。不当的权限使用可能引发严重后果,包括信息泄露和基础设施破坏。尽管已有多个基准用于研究智能体安全性,但它们通常依赖预编程工具和受限的交互模式。这类构造环境与真实场景存在显著差异,难以评估智能体在关键权限控制与使用方面的安全能力。为此,我们提出GrantBox——一个用于分析智能体权限行为的安全评估沙盒。GrantBox可自动集成真实工具,并允许大语言模型智能体调用真实权限,从而评估其在提示注入攻击下的权限使用情况。结果表明,尽管大语言模型具有基础安全认知能力,能阻止部分直接攻击,但仍易受更复杂攻击方式的影响,在精心设计的场景中平均攻击成功率高达84.80%。