Multi-agent systems, augmented with Large Language Models (LLMs), demonstrate significant capabilities for collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. From the perspective of agent psychology, we discover that the dark psychological states of agents can lead to severe safety issues. To address these issues, we propose a comprehensive framework grounded in agent psychology. In our framework, we focus on three aspects: identifying how dark personality traits in agents might lead to risky behaviors, designing defense strategies to mitigate these risks, and evaluating the safety of multi-agent systems from both psychological and behavioral perspectives. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' propensity for self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and their dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https:/github.com/AI4Good24/PsySafe.
翻译:多智能体系统在大型语言模型(LLM)的增强下展现出显著的集体智能能力。然而,这种智能可能被恶意利用,带来重大风险。迄今为止,针对多智能体系统安全问题的全面研究仍十分有限。从智能体心理学视角出发,我们发现智能体的黑暗心理状态可能导致严重的安全隐患。为此,我们提出一个基于智能体心理学的综合框架。该框架聚焦三个方面:识别智能体暗黑人格特质如何引发风险行为、设计缓解这些风险的防御策略、以及从心理与行为双重视角评估多智能体系统的安全性。实验揭示了若干有趣现象,例如智能体间的集体危险行为、智能体在实施危险行为时的自我反思倾向,以及智能体心理评估结果与其危险行为之间的关联性。我们期待该框架与发现能为多智能体系统安全性的进一步研究提供宝贵见解。相关数据与代码将在https:/github.com/AI4Good24/PsySafe公开。