Large language models (LLMs) are now increasingly utilized for role-playing tasks, especially in impersonating domain-specific experts, primarily through role-playing prompts. When interacting in real-world scenarios, the decision-making abilities of a role significantly shape its behavioral patterns. In this paper, we concentrate on evaluating the decision-making abilities of LLMs post role-playing thereby validating the efficacy of role-playing. Our goal is to provide metrics and guidance for enhancing the decision-making abilities of LLMs in role-playing tasks. Specifically, we first use LLMs to generate virtual role descriptions corresponding to the 16 personality types of Myers-Briggs Type Indicator (abbreviated as MBTI) representing a segmentation of the population. Then we design specific quantitative operations to evaluate the decision-making abilities of LLMs post role-playing from four aspects: adaptability, exploration$\&$exploitation trade-off ability, reasoning ability, and safety. Finally, we analyze the association between the performance of decision-making and the corresponding MBTI types through GPT-4. Extensive experiments demonstrate stable differences in the four aspects of decision-making abilities across distinct roles, signifying a robust correlation between decision-making abilities and the roles emulated by LLMs. These results underscore that LLMs can effectively impersonate varied roles while embodying their genuine sociological characteristics.
翻译:大型语言模型(LLMs)现越来越多地应用于角色扮演任务,尤其是在模拟领域专家时,主要通过角色扮演提示实现。在现实场景交互中,角色的决策能力显著塑造其行为模式。本文聚焦于评估LLMs在角色扮演后的决策能力,从而验证角色扮演的有效性。我们的目标是为增强LLMs在角色扮演任务中的决策能力提供指标和指导。具体而言,我们首先利用LLMs生成与迈尔斯-布里格斯类型指标(MBTI)16种人格类型相对应的虚拟角色描述——该指标代表了人群的细分。随后,我们设计具体的量化操作,从四个维度评估LLMs角色扮演后的决策能力:适应性、探索与利用权衡能力、推理能力与安全性。最后,我们通过GPT-4分析决策表现与对应MBTI类型之间的关联。大量实验表明,不同角色在决策能力的四个维度上存在稳定差异,这标志着决策能力与LLMs模拟的角色之间存在稳健相关性。这些结果凸显了LLMs能够有效模拟多样化角色,同时体现其真实社会学特征。