Playing repeated games with Large Language Models

Large Language Models (LLMs) are transforming society and permeating into diverse applications. As a result, LLMs will frequently interact with us and other agents. It is, therefore, of great societal value to understand how LLMs behave in interactive social settings. Here, we propose to use behavioral game theory to study LLM's cooperation and coordination behavior. To do so, we let different LLMs (GPT-3, GPT-3.5, and GPT-4) play finitely repeated games with each other and with other, human-like strategies. Our results show that LLMs generally perform well in such tasks and also uncover persistent behavioral signatures. In a large set of two players-two strategies games, we find that LLMs are particularly good at games where valuing their own self-interest pays off, like the iterated Prisoner's Dilemma family. However, they behave sub-optimally in games that require coordination. We, therefore, further focus on two games from these distinct families. In the canonical iterated Prisoner's Dilemma, we find that GPT-4 acts particularly unforgivingly, always defecting after another agent has defected only once. In the Battle of the Sexes, we find that GPT-4 cannot match the behavior of the simple convention to alternate between options. We verify that these behavioral signatures are stable across robustness checks. Finally, we show how GPT-4's behavior can be modified by providing further information about the other player as well as by asking it to predict the other player's actions before making a choice. These results enrich our understanding of LLM's social behavior and pave the way for a behavioral game theory for machines.

翻译：大语言模型（LLMs）正在改变社会并渗透到各种应用中。因此，LLMs将频繁地与人类及其他智能体互动。理解LLMs在交互式社交环境中的行为具有重要的社会价值。本文提出运用行为博弈理论来研究LLMs的合作与协调行为。为此，我们让不同LLMs（GPT-3、GPT-3.5和GPT-4）相互之间以及与类人策略进行有限次重复博弈。结果表明，LLMs在此类任务中总体表现良好，并展现出持久的行为特征。在一组大规模的两玩家-两策略博弈中，我们发现LLMs擅长那些注重自身利益能带来回报的博弈，例如迭代囚徒困境家族。然而，在需要协调的博弈中，其表现未达最优。因此，我们进一步聚焦于这两个不同家族中的典型博弈。在经典迭代囚徒困境中，GPT-4表现出异常苛刻的报复行为：一旦其他玩家背叛一次，便始终选择背叛。在性别战博弈中，GPT-4无法模仿简单的交替选择惯例。我们通过稳健性检验验证了这些行为特征的稳定性。最后，我们展示了通过提供对手的额外信息或要求GPT-4在决策前预测对手行动，可有效调整其行为模式。这些发现深化了对LLMs社会行为的认知，为构建机器行为博弈理论奠定了基础。