It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our experiments show that recent models -- with or without reasoning enabled -- consistently defect in single-shot social dilemmas. To tackle this safety concern, we present the first comparative study of game-theoretic mechanisms that are designed to enable cooperative outcomes between rational agents _in equilibrium_. Across four social dilemmas testing distinct components of robust cooperation, we evaluate the following mechanisms: (1) repeating the game for many rounds, (2) reputation systems, (3) third-party mediators to delegate decision making to, and (4) contract agreements for outcome-conditional payments between players. Among our findings, we establish that contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models, and that repetition-induced cooperation deteriorates drastically when co-players vary. Moreover, we demonstrate that these cooperation mechanisms become _more effective_ under evolutionary pressures to maximize individual payoffs.
翻译:随着LLM智能体与其他有目标追求智能体的有效与安全交互日益重要,近期研究却呈现相反趋势:在囚徒困境、公共物品等混合动机博弈中,具备更强推理能力的LLM表现出更低的合作性。我们的实验证实,无论是否启用推理功能,当前模型在单轮社会困境中始终保持背叛行为。针对这一安全隐患,我们首次对旨在实现理性智能体间均衡合作结果的博弈论机制开展了比较研究。通过四项检验稳健合作不同维度的社会困境实验,我们评估了以下机制:(1)多轮次重复博弈、(2)声誉系统、(3)委托第三方调解人决策、(4)基于结果条件的玩家间合约支付协议。研究发现,合约与调解机制最有助于实现强能力LLM模型间的合作结果,而重复博弈诱导的合作会因博弈对象的变更急剧恶化。此外,我们证明在个体收益最大化的进化压力下,这些合作机制的效果会进一步增强。