CASPIAN: Online Detection and Attribution of Cascade Attacks in LLM Multi-Agent Systems via Cross-Channel Causal Monitoring

Cascade attacks in LLM multi-agent systems (MAS) arise when adversarial influence propagates across agents and leads to escalated system-level failures through complex agent interactions. Detecting such cascades is challenging, as their signals are distributed, tightly coupled across interaction channels, and often appear plausibly benign locally but may unfold quickly either within a single turn or gradually across multiple turns. Existing defenses, being largely local and text-centric, fail to capture such cross-channel, temporally coordinated dynamics of cascade propagation. Therefore, we propose CASPIAN, the first framework that provides a unified, cross-channel causal analysis of cascade behavior in LLM-MAS through online monitoring of dynamic influence propagation across agents. CASPIAN models multi-agent interactions using a unified, dynamic causal influence matrix across channels, estimated efficiently via a late-interaction conditional transfer entropy (LI-CTE) formulation, thereby enabling the detection of cascade onset from emergent system-level structure rather than isolated anomalies. It further performs online causal attribution, identifying the origin, bridge, and amplifier agents driving the cascade and reconstructing its principal propagation pathways, capabilities not supported by existing methods. Across diverse multi-agent frameworks and benchmarks, CASPIAN consistently outperforms semantic guardrails, LLM-based judges, and graph-based anomaly detectors in both detection accuracy and early cascade identification while operating with sub-1% relative overhead latency. These results demonstrate that unified cross-channel causal modeling is essential for reliably detecting and understanding cascade failures in LLM multi-agent systems.

翻译：LLM多智能体系统（MAS）中的级联攻击，是指对抗性影响在智能体间传播，并通过复杂交互导致系统性故障升级的现象。检测此类级联的挑战在于：其信号分布广泛、跨交互通道紧密耦合，局部观察常表现为看似无害，但可能在单个轮次内快速展开，或分多次逐步蔓延。现有防御主要基于局部文本分析，无法捕捉此类跨通道、时序协同的级联传播动态。为此，我们提出CASPIAN——首个通过在线监控跨智能体动态影响传播，对LLM-MAS中级联行为实现统一跨通道因果分析的框架。CASPIAN通过后期交互条件迁移熵（LI-CTE）方法高效估计跨通道的统一动态因果影响矩阵，从而基于涌现的系统级结构（而非孤立异常）检测级联起始。该框架进一步执行在线因果归因，可识别驱动级联的源头、桥梁及放大器智能体，并重建其主要传播路径——这是现有方法无法支持的功能。在多种多智能体框架与基准测试中，CASPIAN在检测准确率与早期级联识别方面均优于语义护栏、基于LLM的评判器及图异常检测器，且额外延迟开销维持在1%以下。结果表明：统一跨通道因果建模对可靠检测与理解LLM多智能体系统中的级联故障至关重要。