LLM agents operating in open environments face escalating risks from indirect prompt injection, particularly within the tool stream where manipulated metadata and runtime feedback hijack execution flow. Existing defenses encounter a critical dilemma as advanced models prioritize injected rules due to strict alignment while static protection mechanisms sever the feedback loop required for adaptive reasoning. To reconcile this conflict, we propose \textbf{VIGIL}, a framework that shifts the paradigm from restrictive isolation to a verify-before-commit protocol. By facilitating speculative hypothesis generation and enforcing safety through intent-grounded verification, \textbf{VIGIL} preserves reasoning flexibility while ensuring robust control. We further introduce \textbf{SIREN}, a benchmark comprising 959 tool stream injection cases designed to simulate pervasive threats characterized by dynamic dependencies. Extensive experiments demonstrate that \textbf{VIGIL} outperforms state-of-the-art dynamic defenses by reducing the attack success rate by over 22\% while more than doubling the utility under attack compared to static baselines, thereby achieving an optimal balance between security and utility. Code is available at https://anonymous.4open.science/r/VIGIL-378B/.
翻译:在开放环境中运行的LLM智能体面临着来自间接提示注入的日益增长的风险,尤其是在工具流中,被篡改的元数据和运行时反馈会劫持执行流程。现有防御方法面临一个关键困境:先进模型因严格对齐要求而优先执行注入规则,而静态保护机制则会切断自适应推理所需的反馈循环。为解决这一矛盾,我们提出\textbf{VIGIL}框架,将防御范式从限制性隔离转变为提交前验证协议。通过支持推测性假设生成并执行基于意图的验证来确保安全性,\textbf{VIGIL}在保持推理灵活性的同时实现了鲁棒控制。我们进一步提出\textbf{SIREN}基准测试集,包含959个工具流注入案例,旨在模拟具有动态依赖特征的普遍威胁。大量实验表明,\textbf{VIGIL}相较于最先进的动态防御方法,将攻击成功率降低了超过22\%,同时与静态基线相比,在受攻击情况下的效用提升了一倍以上,从而实现了安全性与效用的最优平衡。代码发布于https://anonymous.4open.science/r/VIGIL-378B/。