Payoff scaling shapes cooperation in LLM agents across languages

Trung-Kiet Huynh,Dao-Sy Duy-Minh,Thanh-Bang Cao,Phong-Hao Le,Hong-Dan Nguyen,Phu-Quy Nguyen-Lam,Minh-Luan Nguyen-Vo,Hong-Phat Pham,Phu-Hoa Pham,Thien-Kim Than,Chi-Nguyen Tran,Huy Tran,Gia-Thoai Tran-Le,Alessio Buscemi,Le Hong Trang,The Anh Han

from arxiv, 44 pages, 17 figures, 4 tables

Large language models (LLMs) are increasingly deployed as autonomous agents that negotiate, coordinate, and act on behalf of users. Whether they cooperate in such settings is no longer just an academic question, but a central issue for AI governance. We approach it from a strategic-behaviour angle, asking how two everyday levers - the size of what is at stake, and the language in which the interaction is described - shape the strategies LLMs adopt in a repeated Prisoner's Dilemma. Rather than reading cooperation off raw action counts, we train supervised classifiers to recognise the canonical strategies of repeated games (always cooperate, always defect, Tit-for-Tat, Win-Stay-Lose-Shift) and use them as a lens onto LLM behaviour. To know what the strategy distribution should look like under the same payoffs, we derive an evolutionary game theory (EGT) baseline and compare it with the LLM data. The two outcomes disagree in a revealing way: as stakes grow, evolutionary theory predicts that defection should take over the population, yet LLMs move in the opposite direction, becoming more cooperative - a signature, we argue, of alignment training and the human-like reasoning patterns LLMs inherit from their training data. We further show that this picture is not particular to frontier-scale, proprietary models: it also occurs with three open-weight smaller LLMs. Overall, our analysis highlights that payoff design and linguistic framing are powerful but under-explored levers for steering LLM behaviour, with direct implications for evaluating, aligning, and governing multi-agent AI systems deployed in high-stakes, multilingual environments.

翻译：大语言模型（LLM）正越来越多地被部署为自主智能体，代表用户进行谈判、协调和行动。它们在此类环境中是否合作已不再仅仅是学术问题，而是人工智能治理的核心议题。我们从策略行为角度切入，探究两个日常杠杆——利害攸关的规模大小以及描述交互所用的语言——如何影响LLM在重复囚徒困境中采用的策略。我们并非直接依据原始行动计数来评判合作，而是训练监督分类器识别重复博弈的经典策略（始终合作、始终背叛、以牙还牙、赢留输变），并将其作为观察LLM行为的透镜。为获知相同收益下策略分布应有的形态，我们推导了演化博弈论（EGT）基线，并与LLM数据进行比较。两种结果以揭示性的方式呈现分歧：随着收益规模增大，演化理论预测背叛应占据群体主导，而LLM却朝相反方向移动，变得更合作——我们认为，这指向对齐训练以及LLM从训练数据中继承的人类推理模式的特征。我们进一步证明，这一现象并非前沿闭源模型的专利：三个开放权重的较小LLM也表现出相同趋势。总体而言，我们的分析强调，收益设计与语言框架是引导LLM行为的强大但尚未充分探索的杠杆，对评估、对齐和治理部署在高风险、多语言环境中的多智能体AI系统具有直接影响。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

从静态模板到动态运行时图：大语言模型智能体（LLM Agents）工作流优化综述

专知会员服务

23+阅读 · 3月30日

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

29+阅读 · 2月27日

法律领域中的大语言模型智能体：分类体系、应用场景与挑战

专知会员服务

17+阅读 · 1月14日