The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.
翻译:大型语言模型(LLMs)作为人工社会智能体的行为在很大程度上尚未被探索,我们仍缺乏这些智能体对简单社会刺激反应的广泛证据。在经典博弈论实验中测试AI智能体的行为,为评估这些智能体在典型社会情境中的规范与价值提供了一个有前景的理论框架。本研究考察了三种LLMs(Llama2、Llama3和GPT3.5)在与表现出不同敌意水平的随机对手进行迭代囚徒困境博弈时的合作行为。我们引入了一种系统化方法,用于评估LLM对游戏规则的理解及其解析历史游戏记录以进行决策的能力。我们模拟了持续100轮的博弈,并依据行为经济学文献定义的维度分析了LLMs的决策。研究发现,所有模型均倾向于不主动背叛,而是采取谨慎策略,仅在对手背叛率较低时更倾向于合作而非背叛。总体而言,LLMs的行为至少与典型人类玩家一样具有合作性,尽管我们的结果表明不同模型之间存在显著差异。具体而言,Llama2和GPT3.5比人类更具合作性,尤其在对手背叛率低于30%时表现出更强的宽容性与非报复性。Llama3则与人类更为相似,除非对手始终合作,否则会持续表现出不合作与剥削性行为。我们在博弈论场景中研究LLMs的系统化方法,是朝着利用此类模拟为LLM审计与对齐实践提供信息迈出的一步。