Collusion among autonomous agents poses a critical security threat in embodied multi-agent systems (MAS), where coordinated behaviors can deviate from global objectives and lead to real-world consequences. Existing defenses, primarily based on identity control or post-hoc behavior analysis, are insufficient to address such threats in embodied settings due to delayed feedback and noisy observations in physical environments, which make behavioral deviations difficult to detect accurately and in a timely manner. To address this challenge, we propose a mutagenic incentive intervention approach that mitigates collusion by reshaping agents' payoff structures. By rewarding agents who report collusive behavior and penalizing identified participants, the mechanism induces strategic defection and renders collusion unstable. We further design supporting mechanisms, including reporting deposits, smart contract-based reward enforcement, and encrypted communication, to ensure robustness against misuse of the incentive mechanism and retaliation from penalized agents. We implement the proposed approach in both simulated and real-world embodied environments. Experimental results show that our method effectively suppresses collusion by inducing defection, while preserving system efficiency. It achieves performance comparable to the non-collusion baseline and outperforms representative reactive defenses, thereby fulfilling the desired security objectives. These results demonstrate the effectiveness of proactive incentive design as a practical paradigm for securing embodied multi-agent systems.
翻译:自主智能体之间的合谋对具身多智能体系统构成严重安全威胁,这种协调行为可能偏离全局目标并导致现实世界中的严重后果。现有防御措施主要基于身份控制或事后行为分析,但由于物理环境中存在反馈延迟和观测噪声,行为偏差难以被准确及时地检测,因此无法有效应对具身环境中的此类威胁。为解决这一挑战,我们提出了一种诱变激励干预方法,通过重塑智能体的收益结构来抑制合谋。该机制通过奖励举报合谋行为的智能体并惩罚确认参与的个体,诱导策略性背叛,使合谋状态变得不稳定。我们进一步设计了配套机制,包括举报押金、基于智能合约的奖励执行以及加密通信,以确保该激励机制的稳健性,防止被滥用以及受罚智能体的报复。我们在仿真和真实具身环境中实现了所提出的方法。实验结果表明,该方法通过诱导背叛有效抑制了合谋,同时保持了系统效率。其性能达到了与非合谋基线相当的水平,并优于具有代表性的反应式防御方法,从而实现了预期的安全目标。这些结果证明了主动激励设计作为保障具身多智能体系统安全的实用范式的有效性。