As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions remains a significant challenge. We introduce the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. In GovSim, a society of AI agents must collectively balance exploiting a common resource with sustaining it for future use. This environment enables the study of how ethical considerations, strategic planning, and negotiation skills impact cooperative outcomes. We develop an LLM-based agent architecture and test it with the leading open and closed LLMs. We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim, with the highest survival rate below 54%. Ablations reveal that successful multi-agent communication between agents is critical for achieving cooperation in these cases. Furthermore, our analyses show that the failure to achieve sustainable cooperation in most LLMs stems from their inability to formulate and analyze hypotheses about the long-term effects of their actions on the equilibrium of the group. Finally, we show that agents that leverage "Universalization"-based reasoning, a theory of moral thinking, are able to achieve significantly better sustainability. Taken together, GovSim enables us to study the mechanisms that underlie sustainable self-government with specificity and scale. We open source the full suite of our research results, including the simulation environment, agent prompts, and a comprehensive web interface.
翻译:随着人工智能系统渗透到人类生活,确保大型语言模型(LLM)做出安全决策仍然是一个重大挑战。我们引入了公共资源治理模拟(GovSim),这是一个生成式模拟平台,旨在研究LLM中的战略互动与合作决策。在GovSim中,一个由AI智能体组成的社会必须集体平衡对公共资源的开发利用与为未来使用而维护该资源之间的关系。该环境可用于研究伦理考量、战略规划和谈判技巧如何影响合作结果。我们开发了一种基于LLM的智能体架构,并用领先的开源和闭源LLM进行了测试。我们发现,除最强大的LLM智能体外,所有智能体均未能在GovSim中实现可持续均衡,最高存活率低于54%。消融实验表明,智能体之间成功的多智能体通信对于实现合作至关重要。此外,我们的分析表明,大多数LLM未能实现可持续合作的原因在于它们无法构建和分析关于自身行为对群体均衡长期影响的假设。最后,我们证明采用基于"普遍化"推理(一种道德思维理论)的智能体能够实现显著更好的可持续性。综上所述,GovSim使我们能够以特定性和规模研究可持续自治的内在机制。我们开源了全部研究成果,包括模拟环境、智能体提示词和完整的网络界面。