Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics

In this study, we investigate the capacity of large language models (LLMs), specifically GPT-3.5, to operationalise natural language descriptions of cooperative, competitive, altruistic, and self-interested behavior in social dilemmas. Our focus is on the iterated Prisoner's Dilemma, a classic example of a non-zero-sum interaction, but our broader research program encompasses a range of experimental economics scenarios, including the ultimatum game, dictator game, and public goods game. Using a within-subject experimental design, we instantiated LLM-generated agents with various prompts that conveyed different cooperative and competitive stances. We then assessed the agents' level of cooperation in the iterated Prisoner's Dilemma, taking into account their responsiveness to the cooperative or defection actions of their partners. Our results provide evidence that LLMs can translate natural language descriptions of altruism and selfishness into appropriate behaviour to some extent, but exhibit limitations in adapting their behavior based on conditioned reciprocity. The observed pattern of increased cooperation with defectors and decreased cooperation with cooperators highlights potential constraints in the LLM's ability to generalize its knowledge about human behavior in social dilemmas. We call upon the research community to further explore the factors contributing to the emergent behavior of LLM-generated agents in a wider array of social dilemmas, examining the impact of model architecture, training parameters, and various partner strategies on agent behavior. As more advanced LLMs like GPT-4 become available, it is crucial to investigate whether they exhibit similar limitations or are capable of more nuanced cooperative behaviors, ultimately fostering the development of AI systems that better align with human values and social norms.

翻译：在本研究中，我们探究了大型语言模型（LLMs），特别是GPT-3.5，将合作、竞争、利他及自利行为的自然语言描述应用于社会困境中的能力。我们重点关注迭代囚徒困境这一非零和互动的经典案例，但更广泛的研究计划涵盖了多种实验经济学场景，包括最后通牒博弈、独裁者博弈和公共物品博弈。采用被试内实验设计，我们通过不同提示词向LLM生成的主体注入多种合作与竞争立场，并评估其在迭代囚徒困境中的合作水平，同时考量其对伙伴合作或背叛行为的响应程度。结果表明，LLMs在一定程度上能将利他主义与自私的文本描述转化为适当行为，但在基于条件互惠调整自身行为方面存在局限。观察到的“对背叛者合作增加，对合作者合作减少”的模式，凸显出LLM在泛化人类社会困境行为知识方面的潜在限制。我们呼吁研究界进一步探索在更广泛的社会困境中影响LLM生成主体涌现行为的因素，审视模型架构、训练参数及各种伙伴策略对主体行为的作用。随着GPT-4等更先进LLM的问世，亟需探究它们是否仍存在类似局限，或能展现更细微的合作行为，最终推动开发更符合人类价值观与社会规范的AI系统。