Autonomous LLM agents generate multi-step action plans that can fail due to contextual misalignment or structural incoherence. Existing anomaly detection methods are ill-suited for this challenge: mean-pooling embeddings dilutes anomalous steps, while contrastive-only approaches ignore sequential structure. Standard unsupervised methods on pre-trained embeddings achieve F1-scores no higher than 0.69. We introduce Trajectory Guard, a Siamese Recurrent Autoencoder with a hybrid loss function that jointly learns task-trajectory alignment via contrastive learning and sequential validity via reconstruction. This dual objective enables unified detection of both "wrong plan for this task" and "malformed plan structure." On benchmarks spanning synthetic perturbations and real-world failures from security audits (RAS-Eval) and multi-agent systems (Who\&When), we achieve F1-scores of 0.88-0.94 on balanced sets and recall of 0.86-0.92 on imbalanced external benchmarks. At 32 ms inference latency, our approach runs 17-27$\times$ faster than LLM Judge baselines, enabling real-time safety verification in production deployments.
翻译:自主大型语言模型智能体会生成多步动作计划,这些计划可能因上下文错位或结构不连贯而失败。现有异常检测方法难以应对这一挑战:均值池化嵌入会稀释异常步骤,而纯对比学习方法则忽略序列结构。基于预训练嵌入的标准无监督方法F1分数最高仅达0.69。我们提出轨迹守卫——一种采用混合损失函数的孪生循环自编码器,通过对比学习联合学习任务轨迹对齐,并通过重构学习序列有效性。这种双重目标能够统一检测“任务计划错误”和“畸形计划结构”。在涵盖安全审计(RAS-Eval)和多智能体系统(Who&When)的合成扰动与真实故障基准测试中,我们在平衡数据集上获得0.88-0.94的F1分数,在非平衡外部基准测试中实现0.86-0.92的召回率。在32毫秒推理延迟下,我们的方法比LLM Judge基线快17-27倍,可在生产部署中实现实时安全验证。