Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https://github.com/AI4Good24/PsySafe.
翻译:多智能体系统在集成大型语言模型(LLMs)后展现出强大的集体智能能力,然而这种智能被恶意利用的可能性带来了显著风险。目前,针对多智能体系统安全问题的全面研究仍十分有限。本文通过智能体心理学这一创新视角探索上述问题,揭示智能体的黑暗心理状态对系统安全构成重大威胁。为此,我们提出基于智能体心理学的综合框架(PsySafe),聚焦三个关键领域:首先,识别智能体的黑暗人格特质如何引发风险行为;其次,从心理学与行为学角度评估多智能体系统的安全性;最后,设计有效策略以缓解这些风险。实验揭示了若干有趣现象,包括智能体间的集体危险行为、智能体在实施危险行为时的自我反思,以及智能体心理评估与危险行为之间的关联。我们期待该框架与发现能为多智能体系统安全研究提供宝贵见解。相关数据与代码将公开于 https://github.com/AI4Good24/PsySafe。