比人类更友善：大型语言模型在囚徒困境中如何表现？ (Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma?)

from arxiv, v1: 9 pages, 8 figures, 1 table v2: 11 pages, 14 figures, 1 table. Increased number of models studied, expanded results and conclusion, added references, corrected typos

The behavior of Large Language Models (LLMs) as artificial social agents is largely unexplored, and we still lack extensive evidence of how these agents react to simple social stimuli. Testing the behavior of AI agents in classic Game Theory experiments provides a promising theoretical framework for evaluating the norms and values of these agents in archetypal social situations. In this work, we investigate the cooperative behavior of three LLMs (Llama2, Llama3, and GPT3.5) when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We introduce a systematic methodology to evaluate an LLM's comprehension of the game rules and its capability to parse historical gameplay logs for decision-making. We conducted simulations of games lasting for 100 rounds and analyzed the LLMs' decisions in terms of dimensions defined in the behavioral economics literature. We find that all models tend not to initiate defection but act cautiously, favoring cooperation over defection only when the opponent's defection rate is low. Overall, LLMs behave at least as cooperatively as the typical human player, although our results indicate some substantial differences among models. In particular, Llama2 and GPT3.5 are more cooperative than humans, and especially forgiving and non-retaliatory for opponent defection rates below 30%. More similar to humans, Llama3 exhibits consistently uncooperative and exploitative behavior unless the opponent always cooperates. Our systematic approach to the study of LLMs in game theoretical scenarios is a step towards using these simulations to inform practices of LLM auditing and alignment.

翻译：大型语言模型（LLMs）作为人工社会智能体的行为在很大程度上尚未被探索，我们仍缺乏这些智能体对简单社会刺激反应的广泛证据。在经典博弈论实验中测试AI智能体的行为，为评估这些智能体在典型社会情境中的规范与价值提供了一个有前景的理论框架。本研究考察了三种LLMs（Llama2、Llama3和GPT3.5）在与表现出不同敌意水平的随机对手进行迭代囚徒困境博弈时的合作行为。我们引入了一种系统化方法，用于评估LLM对游戏规则的理解及其解析历史游戏记录以进行决策的能力。我们模拟了持续100轮的博弈，并依据行为经济学文献定义的维度分析了LLMs的决策。研究发现，所有模型均倾向于不主动背叛，而是采取谨慎策略，仅在对手背叛率较低时更倾向于合作而非背叛。总体而言，LLMs的行为至少与典型人类玩家一样具有合作性，尽管我们的结果表明不同模型之间存在显著差异。具体而言，Llama2和GPT3.5比人类更具合作性，尤其在对手背叛率低于30%时表现出更强的宽容性与非报复性。Llama3则与人类更为相似，除非对手始终合作，否则会持续表现出不合作与剥削性行为。我们在博弈论场景中研究LLMs的系统化方法，是朝着利用此类模拟为LLM审计与对齐实践提供信息迈出的一步。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日