机器能像人类一样思考吗？基于独裁者博弈的LLM智能体行为评估 (Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games)

As Large Language Model (LLM)-based agents increasingly undertake real-world tasks and engage with human society, how well do we understand their behaviors? We (1) investigate how LLM agents' prosocial behaviors -- a fundamental social norm -- can be induced by different personas and benchmarked against human behaviors; and (2) introduce a behavioral and social science approach to evaluate LLM agents' decision-making. We explored how different personas and experimental framings affect these AI agents' altruistic behavior in dictator games and compared their behaviors within the same LLM family, across various families, and with human behaviors. The findings reveal substantial variations and inconsistencies among LLMs and notable differences compared to human behaviors. Merely assigning a human-like identity to LLMs does not produce human-like behaviors. Despite being trained on extensive human-generated data, these AI agents are unable to capture the internal processes of human decision-making. Their alignment with human is highly variable and dependent on specific model architectures and prompt formulations; even worse, such dependence does not follow a clear pattern. LLMs can be useful task-specific tools but are not yet intelligent human-like agents.

翻译：随着基于大语言模型（LLM）的智能体越来越多地承担现实世界任务并融入人类社会，我们对其行为的理解程度如何？本研究（1）探讨了LLM智能体的亲社会行为——一种基本社会规范——如何被不同角色设定所诱导，并以人类行为为基准进行衡量；（2）引入行为与社会科学的研究方法来评估LLM智能体的决策机制。我们通过独裁者博弈探究了不同角色设定和实验框架如何影响这些AI智能体的利他行为，并在同一LLM家族内部、不同家族之间以及与人类行为进行了系统比较。研究结果显示：不同LLM之间存在显著差异与不一致性，与人类行为相比也存在明显区别。仅赋予LLM类人身份并不能产生类人行为。尽管这些AI智能体接受了海量人类生成数据的训练，它们仍无法捕捉人类决策的内在过程。其与人类行为的对齐度具有高度可变性，且严重依赖于特定模型架构与提示词设计；更严重的是，这种依赖性并未呈现清晰规律。LLM可作为特定任务的有效工具，但尚未成为具有类人智能的智能体。

相关内容

大语言模型

关注 65

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日