We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filtering. However, we argue that ensuring LLM agents reliability requires auditing the intermediate execution process. In this work, we formulate the task of Trajectory Anomaly Detection. The goal is not merely detection, but precise error localization. This capability is essential for enabling efficient rollback-and-retry. To achieve this, we construct TrajBench, a dataset synthesized via a perturb-and-complete strategy to cover diverse procedural anomalies. Using this benchmark, we investigate the capability of models in process supervision. We observe that general-purpose LLMs, even with zero-shot prompting, struggle to identify and localize these anomalies. This reveals that generalized capabilities do not automatically translate to process reliability. To address this, we propose TrajAD, a specialized verifier trained with fine-grained process supervision. Our approach outperforms baselines, demonstrating that specialized supervision is essential for building trustworthy agents.
翻译:本文研究运行时轨迹异常检测问题,这是实现可信LLM智能体的关键能力。当前的安全措施主要集中于静态输入/输出过滤,但我们认为确保LLM智能体可靠性需要对其中间执行过程进行审计。本工作系统阐述了轨迹异常检测任务,其目标不仅在于异常检测,更在于实现精确的错误定位,该能力对于实现高效的回滚重试机制至关重要。为实现这一目标,我们通过扰动补全策略构建了TrajBench数据集,该数据集覆盖了多样化的程序异常类型。基于此基准测试,我们深入探究了模型在过程监督方面的能力。研究发现,通用型LLM即使在零样本提示条件下,仍难以有效识别和定位此类异常,这表明通用能力并不能自动转化为过程可靠性。为此,我们提出TrajAD——一种通过细粒度过程监督训练的专用验证器。该方法在多项基线测试中表现优异,证明专业化监督对于构建可信智能体具有不可或缺的作用。