Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, these agents can carry out complex user tasks. Nonetheless, this interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior, potentially resulting in economic loss, privacy leakage, or system compromise. System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose Dynamic Rule-based Isolation Framework for Trustworthy agentic systems (DRIFT), which enforces the dynamic security policy and injection isolation for securing LLM agents against prompt injection attacks. A Secure Planner first constructs a minimal function trajectory and a JSON-schema-style parameter checklist for each function node based on the user query. A Dynamic Validator then monitors deviations from the original plan, assessing whether changes comply with privilege limitations and the user's intent. Finally, an Injection Isolator detects and masks any instructions that may conflict with the user query from the memory stream to mitigate long-term risks. We empirically validate the effectiveness of DRIFT on the AgentDojo, ASB, and AgentDyn benchmark, demonstrating its strong security performance while maintaining high utility across diverse models, showcasing both its robustness and adaptability. The project website is available at https://safo-lab.github.io/DRIFT.
翻译:大语言模型(LLMs)因其强大的推理与规划能力,正日益成为智能体系统的核心。通过与预设工具交互外部环境,这些智能体可执行复杂用户任务。然而,这种交互也引入了提示注入攻击风险:来自外部来源的恶意输入可能误导智能体行为,进而导致经济损失、隐私泄露或系统受损。近期系统级防御通过强制执行静态或预定义策略展现出潜力,但仍面临两个关键挑战:动态更新安全规则的能力与记忆流隔离需求。为应对这些挑战,我们提出面向可信智能体系统的动态规则隔离框架(DRIFT),该框架通过实施动态安全策略与注入隔离,保护LLM智能体免受提示注入攻击。首先,安全规划器基于用户查询为每个功能节点构建最小功能轨迹与JSON模式风格参数检查表;其次,动态验证器监控原始计划的偏差,评估变更是否符合权限限制与用户意图;最后,注入隔离器检测并屏蔽记忆流中可能与用户查询冲突的指令,以缓解长期风险。我们在AgentDojo、ASB与AgentDyn基准上实证验证了DRIFT的有效性,证明其在保持高实用性的同时展现出强大安全性能,兼顾鲁棒性与适应性。项目网站访问地址:https://safo-lab.github.io/DRIFT。