Large Language Model (LLM) agents are increasingly deployed in practice across a wide range of autonomous applications. Yet current safety mechanisms for LLM agents focus almost exclusively on preventing failures in advance, providing limited capabilities for responding to, containing, or recovering from incidents after they inevitably arise. In this work, we introduce AIR, the first incident response framework for LLM agent systems. AIR defines a domain-specific language for managing the incident response lifecycle autonomously in LLM agent systems, and integrates it into the agent's execution loop to (1) detect incidents via semantic checks grounded in the current environment state and recent context, (2) guide the agent to execute containment and recovery actions via its tools, and (3) synthesize guardrail rules during eradication to block similar incidents in future executions. We evaluate AIR on three representative agent types. Results show that AIR achieves detection, remediation, and eradication success rates all exceeding 90%. Extensive experiments further confirm the necessity of AIR's key design components, show the timeliness and moderate overhead of AIR, and demonstrate that LLM-generated rules can approach the effectiveness of developer-authored rules across domains. These results show that incident response is both feasible and essential as a first-class mechanism for improving agent safety.
翻译:大型语言模型(LLM)智能体正日益广泛地部署于各类自主应用中。然而,当前LLM智能体的安全机制几乎完全专注于事前预防故障,对于不可避免发生的事件,其响应、遏制或恢复能力十分有限。本研究提出了AIR,首个面向LLM智能体系统的事件响应框架。AIR定义了一种领域特定语言,用于在LLM智能体系统中自主管理事件响应生命周期,并将其集成至智能体执行循环中,以实现:(1)基于当前环境状态与近期上下文进行语义检查以检测事件;(2)引导智能体通过其工具执行遏制与恢复操作;(3)在根除阶段合成防护规则以阻断未来执行中的类似事件。我们在三种代表性智能体类型上评估了AIR。结果表明,AIR在检测、修复与根除方面的成功率均超过90%。大量实验进一步证实了AIR关键设计组件的必要性,展示了其时效性与适度开销,并证明LLM生成的规则在不同领域可接近开发者编写规则的有效性。这些结果表明,事件响应作为一种提升智能体安全性的首要机制,既具备可行性又至关重要。