The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN
翻译:大型语言模型(LLM)的出现使得能够参与复杂多轮对话的智能体得以发展。然而,多智能体协作面临着关键的安全挑战,例如幻觉放大以及错误注入与传播。本文提出GUARDIAN,一种用于在守护智能体协作(GUARDing Intelligent Agent collaboratioNs)过程中检测并缓解多种安全问题的统一方法。通过将多智能体协作过程建模为离散时序属性图,GUARDIAN显式地捕捉了幻觉与错误的传播动态。该无监督编码器-解码器架构结合增量训练范式,能够从潜在嵌入中重构节点属性与图结构,从而以前所未有的精度识别异常节点与边。此外,我们引入了一种基于信息瓶颈理论的图抽象机制,该机制在压缩时序交互图的同时保留了关键模式。大量实验证明,GUARDIAN能有效防护LLM多智能体协作免受多种安全漏洞威胁,在实现最先进准确率的同时保证了高效的资源利用率。代码发布于 https://github.com/JialongZhou666/GUARDIAN。