This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2, within the framework of game theory. Utilizing four canonical two-player games -- Prisoner's Dilemma, Stag Hunt, Snowdrift, and Prisoner's Delight -- we explore how these models navigate social dilemmas, situations where players can either cooperate for a collective benefit or defect for individual gain. Crucially, we extend our analysis to examine the role of contextual framing, such as diplomatic relations or casual friendships, in shaping the models' decisions. Our findings reveal a complex landscape: while GPT-3.5 is highly sensitive to contextual framing, it shows limited ability to engage in abstract strategic reasoning. Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and context, but LLaMa-2 exhibits a more nuanced understanding of the games' underlying mechanics. These results highlight the current limitations and varied proficiencies of LLMs in strategic decision-making, cautioning against their unqualified use in tasks requiring complex strategic reasoning.
翻译:本文在博弈论框架下,系统研究了三种大型语言模型(LLMs):GPT-3.5、GPT-4和LLaMa-2的策略决策能力。通过四个经典双人博弈——囚徒困境、猎鹿博弈、雪堆博弈与囚徒愉悦博弈,我们探讨了这些模型如何应对社会困境(即玩家可选择为集体利益合作或为个人利益背叛)。尤为关键的是,我们将分析扩展至语境框架(如外交关系或日常友谊)对模型决策的塑造作用。研究发现揭示了复杂的图景:GPT-3.5虽对语境框架高度敏感,但在抽象策略推理方面能力有限;GPT-4和LLaMa-2均能根据博弈结构与语境调整策略,但LLaMa-2对博弈底层机制展现出更细致的理解。这些结果揭示了LLMs在策略决策中的当前局限性及能力差异,警示其在需复杂策略推理的任务中不可不加甄别地使用。