The digital world is witnessing the rapid rise of LLM-based multi-agent systems (MASs) and their powerful applications. However, their security remains insufficiently understood, as existing evaluations are largely limited to narrow attack settings and may substantially underestimate the real risks of MAS deployments. Inspired by the MAS inter-agent dependencies, where upstream outputs are reinterpreted and executed by downstream agents, we propose a topology-aware attack scheme that propagates adversarial contamination from exposed edge agents to high-privilege agents to induce malicious behaviors. By combining topology reconnaissance, contamination propagation modeling, and hierarchical payload encapsulation, our approach overcomes the key challenges of black-box attacks and makes such multi-hop compromise practical. Experiments show that our approach achieves success rates of 40\%--78\% on three widely-used MAS frameworks under five topologies, and 85\% on two real-world MAS applications across 20 representative scenarios. The results reveal fundamental vulnerabilities in MASs that have been overlooked by prior studies. Based on these findings, we propose a topology-trust mitigation that blocks 94.8\% of such composite attacks.
翻译:数字世界正见证基于大语言模型的多智能体系统(MASs)及其强大应用的快速崛起。然而,其安全性尚未得到充分理解——现有评估多局限于狭隘的攻击设定,可能严重低估MAS部署的真实风险。受MAS智能体间依赖关系(即上游输出被下游智能体重新解释并执行)的启发,我们提出一种拓扑感知攻击方案,将对抗污染从暴露的边缘智能体传播至高权限智能体,以诱导恶意行为。通过结合拓扑侦察、污染传播建模与分层有效载荷封装,该方法克服了黑盒攻击的关键挑战,使这种多跳妥协成为实际可行。实验表明,该方法在五种拓扑下的三个广泛使用的MAS框架上实现40%-78%的成功率,并在20个代表性场景的两个真实MAS应用中达到85%的成功率。结果揭示了先前研究忽视的MAS根本性漏洞。基于这些发现,我们提出一种拓扑信任缓解策略,可阻断94.8%此类复合攻击。