Communication topology is a critical factor in the utility and safety of LLM-based multi-agent systems (LLM-MAS), making it a high-value intellectual property (IP) whose confidentiality remains insufficiently studied. Existing topology inference attempts rely on impractical assumptions, including control over the administrative agent and direct identity queries via jailbreaks, which are easily defeated by basic keyword-based defenses. As a result, prior analyses fail to capture the real-world threat of such attacks. To bridge this realism gap, we propose \textit{WebWeaver}, an attack framework that infers the complete LLM-MAS topology by compromising only a single arbitrary agent instead of the administrative agent. Unlike prior approaches, WebWeaver relies solely on agent contexts rather than agent IDs, enabling significantly stealthier inference. WebWeaver further introduces a new covert jailbreak-based mechanism and a novel fully jailbreak-free diffusion design to handle cases where jailbreaks fail. Additionally, we address a key challenge in diffusion-based inference by proposing a masking strategy that preserves known topology during diffusion, with theoretical guarantees of correctness. Extensive experiments show that WebWeaver substantially outperforms state-of-the-art (SOTA) baselines, achieving about 60\% higher inference accuracy under active defenses with negligible overhead.
翻译:通信拓扑是基于大语言模型的多智能体系统(LLM-MAS)效用与安全性的关键因素,使其成为具有高价值的知识产权,但其机密性研究尚不充分。现有拓扑推断尝试依赖不切实际的假设,包括对管理智能体的控制及通过越狱直接查询身份,这些方法易被基于关键词的基础防御手段破解。因此,以往分析未能捕捉此类攻击在现实中的威胁。为弥合这一现实差距,我们提出攻击框架WebWeaver,通过仅攻破任意单一智能体(而非管理智能体)即可推断完整的LLM-MAS拓扑。与先前方法不同,WebWeaver仅依赖智能体上下文而非身份标识,从而实现显著更隐蔽的推断。WebWeaver进一步引入新型隐式越狱机制及完全无需越狱的扩散设计方案,以应对越狱失效的情况。此外,我们针对基于扩散的推断提出一项关键挑战的解决方案:在扩散过程中保留已知拓扑的掩码策略,并提供理论正确性保证。大量实验表明,WebWeaver显著优于当前最优基线方法,在主动防御条件下推断准确率提升约60%,且开销可忽略不计。