AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identifying 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

翻译：基于大语言模型（LLM）构建的智能体正日益广泛地部署于各领域，用于自动化复杂决策与任务执行。然而，其自主性也带来了安全风险，包括安全漏洞、法律违规及非预期的有害行为。现有的缓解方法（如基于模型的防护机制与早期执行策略）在鲁棒性、可解释性与适应性方面存在不足。为应对这些挑战，本文提出AgentSpec——一种轻量级领域特定语言，用于规范并强制执行LLM智能体的运行时约束。通过AgentSpec，用户可定义包含触发条件、判定谓词与执行机制的结构化规则，确保智能体在预设的安全边界内运行。我们在代码执行、具身智能体及自动驾驶等多个领域实现了AgentSpec，验证了其适应性与有效性。评估结果表明：AgentSpec在超过90%的代码智能体案例中成功阻止了不安全执行，在具身智能体任务中消除了所有危险行为，并实现了自动驾驶车辆（AV）100%的规则遵从率。尽管具备强大的安全保障能力，AgentSpec仍保持计算轻量化，其开销仅为毫秒级。通过融合可解释性、模块化与高效性，AgentSpec为跨领域LLM智能体安全执行提供了实用且可扩展的解决方案。我们还利用LLM自动化生成规则并评估其有效性。实验显示：由OpenAI o1生成的规则在具身智能体任务中达到95.56%的精确率与70.96%的召回率，成功识别87.26%的风险代码，并在8个自动驾驶场景中的5个场景有效防止了违法行为。