Large Language Model (LLM) agents demonstrate strong autonomy, but their stochastic behavior introduces unpredictable safety risks. Existing rule-based enforcement systems, such as AgentSpec, are reactive, intervening only when unsafe behavior is imminent or has occurred, lacking foresight for long-horizon dependencies. To overcome these limitations, we present a proactive runtime enforcement framework for LLM agents. The framework abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain (DTMC) from execution traces. At runtime, it predicts the probability of leading to undesired behaviors and intervenes before violations occur when the estimated risk exceeds a user-defined threshold. Designed to provide PAC-correctness guarantee, the framework achieves statistically reliable enforcement of agent safety. We evaluate the framework across two safety-critical domains: autonomous vehicles and embodied agents. It proactively enforces safety and maintains high task performance, outperforming existing methods.
翻译:大型语言模型(LLM)智能体展现出较强的自主性,但其随机性行为会引入不可预测的安全风险。现有的基于规则的强制系统(如AgentSpec)属于被动响应型,仅在危险行为即将发生或已经发生时进行干预,缺乏对长时域依赖关系的预见性。为克服这些局限,本文提出一种面向LLM智能体的主动运行时强制框架。该框架将智能体行为抽象为符号状态,并从执行轨迹中学习离散时间马尔可夫链(DTMC)。在运行时,框架预测导致非预期行为的概率,并在估计风险超过用户定义阈值时,于违规发生前实施干预。该框架设计具备PAC正确性保证,可实现统计可靠的智能体安全强制。我们在自动驾驶和具身智能体两个安全关键领域对该框架进行评估。实验表明,该框架能主动保障安全并维持较高的任务性能,其表现优于现有方法。