The rapid advancement of Large Language Model (LLM)-based Multi-Agent Systems (MAS) has introduced significant security vulnerabilities, where malicious influence can propagate virally through inter-agent communication. Conventional safeguards often rely on a binary paradigm that strictly distinguishes between benign and attack agents, failing to account for infected agents i.e., benign entities converted by attack agents. In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category. By leveraging infection-aware detection and topological constraints, INFA-Guard accurately localizes attack sources and infected ranges. During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity. Extensive experiments demonstrate that INFA-Guard achieves state-of-the-art performance, reducing the Attack Success Rate (ASR) by an average of 33%, while exhibiting cross-model robustness, superior topological generalization, and high cost-effectiveness.
翻译:基于大语言模型(LLM)的多智能体系统(MAS)的快速发展引入了严重的安全漏洞,恶意影响可通过智能体间通信进行病毒式传播。传统防护机制通常依赖严格区分良性智能体与攻击智能体的二元范式,未能考虑被感染智能体——即被攻击智能体转化的良性实体。本文提出感染感知防护框架INFA-Guard,该创新防御框架将被感染智能体明确定位为独立的威胁类别进行处理。通过利用感染感知检测与拓扑约束,INFA-Guard能够精准定位攻击源与感染范围。在修复阶段,INFA-Guard通过替换攻击者并修复被感染智能体,在维持拓扑完整性的同时有效遏制恶意传播。大量实验表明,INFA-Guard实现了最先进的防护性能,平均降低33%的攻击成功率(ASR),同时展现出跨模型鲁棒性、优异的拓扑泛化能力与高成本效益。