Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three injection pathways without model modification or infrastructure change. Experiments across five state-of-the-art language models on six injection benchmarks covering web, local, MCP, and skill channels, as well as three utility benchmarks covering OS, web, and code tasks, demonstrate that \textsc{ClawGuard} achieves robust protection against indirect prompt injection without compromising agent utility or introducing significant token overhead. This work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems. Code is publicly available at github.com/Claw-Guard/ClawGuard/.
翻译:工具增强型大语言模型(LLM)智能体在自动化处理复杂的多步骤真实世界任务中展现出卓越能力,但仍易受到间接提示注入攻击。攻击者通过在工具返回内容中嵌入恶意指令来利用这一漏洞,而智能体会将这些内容直接作为可信观测结果纳入对话历史。为应对此类威胁,我们提出\textsc{ClawGuard},一种新型运行时安全框架,该框架在每次工具调用边界强制执行用户确认的规则集,将不可靠的基于对齐的防御转变为确定性、可审计的机制,从而在恶意工具调用对现实世界产生实际影响之前予以拦截。通过在任何外部工具调用前从用户明确目标中自动推导任务特定访问约束,\textsc{ClawGuard}能在无需修改模型或基础设施的情况下阻断全部三类注入途径。在涵盖Web、本地、MCP及技能通道的六个注入基准测试,以及涉及操作系统、Web和代码任务的三个效用基准测试中,对五种前沿语言模型的实验表明,\textsc{ClawGuard}能在不损害智能体效用且不引入显著代币开销的前提下实现可靠的防护。本工作确立了确定性工具调用边界强制作为安全自主人工智能系统的有效防御机制。代码已开源发布于github.com/Claw-Guard/ClawGuard/。