In the rapidly evolving field of artificial intelligence, ensuring safe decision-making of Large Language Models (LLMs) is a significant challenge. This paper introduces Governance of the Commons Simulation (GovSim), a simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. Through this simulation environment, we explore the dynamics of resource sharing among AI agents, highlighting the importance of ethical considerations, strategic planning, and negotiation skills. GovSim is versatile and supports any text-based agent, including LLMs agents. Using the Generative Agent framework, we create a standard agent that facilitates the integration of different LLMs. Our findings reveal that within GovSim, only two out of 15 tested LLMs managed to achieve a sustainable outcome, indicating a significant gap in the ability of models to manage shared resources. Furthermore, we find that by removing the ability of agents to communicate, they overuse the shared resource, highlighting the importance of communication for cooperation. Interestingly, most LLMs lack the ability to make universalized hypotheses, which highlights a significant weakness in their reasoning skills. We open source the full suite of our research results, including the simulation environment, agent prompts, and a comprehensive web interface.
翻译:在人工智能快速发展的领域中,确保大语言模型(LLM)的安全决策是一项重大挑战。本文介绍了公共资源治理模拟平台(GovSim),该平台旨在研究LLM在战略互动与合作决策中的表现。通过这一模拟环境,我们探讨了AI智能体间的资源分配动态,强调了伦理考量、战略规划与谈判技能的重要性。GovSim具有高度通用性,支持包括LLM智能体在内的所有基于文本的智能体。我们利用生成式智能体框架创建了一个标准智能体,便于集成不同LLM。研究结果表明,在GovSim中,15个受测LLM中仅有2个实现了可持续结果,这表明模型在共享资源管理能力上存在显著差距。此外,我们发现,若移除智能体的沟通能力,它们会过度使用共享资源,这凸显了沟通对合作的重要性。值得注意的是,大多数LLM缺乏形成普遍化假设的能力,这揭示了其推理能力中的明显缺陷。我们开源了全部研究成果,包括模拟环境、智能体提示词及全面的网页交互界面。