As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions is a significant challenge. This paper introduces the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. Using GovSim, we investigate the dynamics of sustainable resource sharing in a society of AI agents. This environment allows us to study the influence of ethical considerations, strategic planning, and negotiation skills on cooperative outcomes for AI agents. We develop an LLM-based agent architecture designed for these social dilemmas and test it with a variety of LLMs. We find that all but the most powerful LLM agents fail to achieve a sustainable equilibrium in GovSim. Ablations reveal that successful multi-agent communication between agents is critical for achieving cooperation in these cases. Furthermore, our analyses show that the failure to achieve sustainable cooperation in most LLMs stems from their inability to formulate and analyze hypotheses about the long-term effects of their actions on the equilibrium of the group. Finally, we show that agents that leverage ``Universalization''-based reasoning, a theory of moral thinking, are able to achieve significantly greater sustainability. Taken together, GovSim enables us to study the mechanisms that underlie sustainable self-government with significant specificity and scale. We open source the full suite of our research results, including the simulation environment, agent prompts, and a comprehensive web interface.
翻译:随着人工智能系统日益渗透人类生活,确保大型语言模型(LLM)做出安全决策已成为重大挑战。本文提出"公共资源治理模拟器"(GovSim),这是一个用于研究LLM战略互动与合作决策的生成式模拟平台。借助GovSim,我们探究了AI智能体社会中可持续资源共享的动态机制。该环境使我们能够研究伦理考量、战略规划和谈判技巧对AI智能体合作结果的影响。我们开发了专门针对此类社会困境的基于LLM的智能体架构,并使用多种LLM进行测试。研究发现,除性能最强的LLM智能体外,其余模型均无法在GovSim中达成可持续均衡。消融实验表明,智能体间成功的多智能体通信是实现合作的关键。进一步分析显示,大多数LLM未能实现可持续合作的原因在于其无法构建并分析自身行为对群体均衡长期影响的假设。最后,我们证明采用基于"普遍化"推理(一种道德思维理论)的智能体能够显著提升可持续性。综合而言,GovSim使我们能够以高度精确性和可扩展性研究可持续自治的内在机制。我们开源了全部研究成果,包括模拟环境、智能体提示词及完整的网络交互界面。