Large language models (LLMs) are increasingly deployed as autonomous agents that negotiate, coordinate, and act on behalf of users. Whether they cooperate in such settings is no longer just an academic question, but a central issue for AI governance. We approach it from a strategic-behaviour angle, asking how two everyday levers - the size of what is at stake, and the language in which the interaction is described - shape the strategies LLMs adopt in a repeated Prisoner's Dilemma. Rather than reading cooperation off raw action counts, we train supervised classifiers to recognise the canonical strategies of repeated games (always cooperate, always defect, Tit-for-Tat, Win-Stay-Lose-Shift) and use them as a lens onto LLM behaviour. To know what the strategy distribution should look like under the same payoffs, we derive an evolutionary game theory (EGT) baseline and compare it with the LLM data. The two outcomes disagree in a revealing way: as stakes grow, evolutionary theory predicts that defection should take over the population, yet LLMs move in the opposite direction, becoming more cooperative - a signature, we argue, of alignment training and the human-like reasoning patterns LLMs inherit from their training data. We further show that this picture is not particular to frontier-scale, proprietary models: it also occurs with three open-weight smaller LLMs. Overall, our analysis highlights that payoff design and linguistic framing are powerful but under-explored levers for steering LLM behaviour, with direct implications for evaluating, aligning, and governing multi-agent AI systems deployed in high-stakes, multilingual environments.
翻译:大语言模型(LLM)正越来越多地被部署为自主智能体,代表用户进行谈判、协调和行动。它们在此类环境中是否合作已不再仅仅是学术问题,而是人工智能治理的核心议题。我们从策略行为角度切入,探究两个日常杠杆——利害攸关的规模大小以及描述交互所用的语言——如何影响LLM在重复囚徒困境中采用的策略。我们并非直接依据原始行动计数来评判合作,而是训练监督分类器识别重复博弈的经典策略(始终合作、始终背叛、以牙还牙、赢留输变),并将其作为观察LLM行为的透镜。为获知相同收益下策略分布应有的形态,我们推导了演化博弈论(EGT)基线,并与LLM数据进行比较。两种结果以揭示性的方式呈现分歧:随着收益规模增大,演化理论预测背叛应占据群体主导,而LLM却朝相反方向移动,变得更合作——我们认为,这指向对齐训练以及LLM从训练数据中继承的人类推理模式的特征。我们进一步证明,这一现象并非前沿闭源模型的专利:三个开放权重的较小LLM也表现出相同趋势。总体而言,我们的分析强调,收益设计与语言框架是引导LLM行为的强大但尚未充分探索的杠杆,对评估、对齐和治理部署在高风险、多语言环境中的多智能体AI系统具有直接影响。