Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the network. Existing defenses face a fundamental dilemma: lightweight single-auditor methods are prone to single points of failure, while robust committee-based approaches incur prohibitive computational costs in multi-turn interactions. To address this challenge, we propose \textbf{MAS-Shield}, a secure and efficient defense framework designed with a coarse-to-fine filtering pipeline. Rather than applying uniform scrutiny, MAS-Shield dynamically allocates defense resources through a three-stage protocol: (1) \textbf{Critical Agent Selection } strategically targets high-influence nodes to narrow the defense surface; (2) \textbf{Light Auditing} employs lightweight sentry models to rapidly filter the majority of benign cases; and (3) \textbf{Global Consensus Auditing} escalates only suspicious or ambiguous signals to a heavyweight committee for definitive arbitration. This hierarchical design effectively optimizes the security-efficiency trade-off. Experiments demonstrate that MAS-Shield achieves a 92.5\% recovery rate against diverse adversarial scenarios and reduces defense latency by over 70\% compared to existing methods.
翻译:基于大语言模型(LLM)的多智能体系统(MAS)易受语言攻击,此类攻击可能在整个网络中引发级联故障。现有防御方案面临一个根本性困境:轻量级的单审计器方法容易产生单点故障,而鲁棒的委员会式方法在多轮交互中会产生难以承受的计算开销。为应对这一挑战,我们提出 \textbf{MAS-Shield},这是一个采用由粗到精过滤流程的安全高效防御框架。MAS-Shield 并非施加统一的严格审查,而是通过一个三阶段协议动态分配防御资源:(1) \textbf{关键智能体选择} 策略性地针对高影响力节点以缩小防御面;(2) \textbf{轻量审计} 采用轻量级哨兵模型快速过滤大多数良性案例;(3) \textbf{全局共识审计} 仅将可疑或模糊信号升级提交给一个重量级委员会进行最终仲裁。这种分层设计有效地优化了安全性与效率之间的权衡。实验表明,与现有方法相比,MAS-Shield 在多种对抗场景下实现了 92.5\% 的恢复率,并将防御延迟降低了 70\% 以上。