不合作的艺术：理解基于LLM的多智能体系统中的非协作行为 (The Subtle Art of Defection: Understanding Uncooperative Behaviors in LLM based Multi-Agent Systems)

This paper introduces a novel framework for simulating and analyzing how uncooperative behaviors can destabilize or collapse LLM-based multi-agent systems. Our framework includes two key components: (1) a game theory-based taxonomy of uncooperative agent behaviors, addressing a notable gap in the existing literature; and (2) a structured, multi-stage simulation pipeline that dynamically generates and refines uncooperative behaviors as agents' states evolve. We evaluate the framework via a collaborative resource management setting, measuring system stability using metrics such as survival time and resource overuse rate. Empirically, our framework achieves 96.7% accuracy in generating realistic uncooperative behaviors, validated by human evaluations. Our results reveal a striking contrast: cooperative agents maintain perfect system stability (100% survival over 12 rounds with 0% resource overuse), while any uncooperative behavior can trigger rapid system collapse within 1 to 7 rounds. We also evaluate LLM-based defense methods, finding they detect some uncooperative behaviors, but some behaviors remain largely undetectable. These gaps highlight how uncooperative agents degrade collective outcomes and underscore the need for more resilient multi-agent systems.

翻译：本文提出了一种新颖的框架，用于模拟和分析非协作行为如何破坏或瓦解基于大语言模型（LLM）的多智能体系统。我们的框架包含两个关键组成部分：（1）基于博弈论的非协作智能体行为分类法，填补了现有文献中的一个显著空白；（2）一个结构化的多阶段模拟流程，能够随着智能体状态的演变动态生成并优化非协作行为。我们通过一个协作式资源管理场景对该框架进行评估，使用生存时间和资源过度使用率等指标来衡量系统稳定性。实验表明，我们的框架在生成逼真的非协作行为方面达到了96.7%的准确率，并通过人工评估验证。我们的结果揭示了一个鲜明对比：协作型智能体能够维持完美的系统稳定性（12轮中100%生存且资源过度使用率为0%），而任何非协作行为都可能在1至7轮内引发系统迅速崩溃。我们还评估了基于LLM的防御方法，发现它们能够检测到部分非协作行为，但某些行为在很大程度上仍无法被检测。这些差距突显了非协作智能体如何损害集体成果，并强调了构建更具韧性的多智能体系统的必要性。