Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

Large language models (LLMs) are increasingly used to simulate human behavior in social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans. Human personality, for instance, shapes how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises the question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that enables direct comparison of human-human and LLM-LLM behaviors in dispute resolution dialogues with respect to Big Five Inventory (BFI) personality traits. This framework provides a set of interpretable metrics related to strategic behavior and conflict outcomes. We additionally contribute a novel dataset creation methodology for LLM dispute resolution dialogues with matched scenarios and personality traits with respect to human conversations. Finally, we demonstrate the use of our evaluation framework with three contemporary closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and validation in AI simulations before real-world use.

翻译：大型语言模型（LLMs）在调解、谈判、争议解决等社会场景中越来越多地被用于模拟人类行为。然而，这些模拟是否能够复现人类的人格-行为模式仍不明确。例如，人类人格塑造了个体在社会互动中的表现方式，包括在情绪化互动中的策略选择和行为。这引出了一个核心问题：当被赋予人格特质提示时，LLMs能否复现人类冲突行为中由人格驱动的差异？为探究此问题，我们提出了一个评估框架，能够直接比较争议解决对话中人类-人类与LLM-LLM行为在大五人格特质（BFI）维度上的表现。该框架提供了一套与策略行为和冲突结果相关的可解释度量指标。我们还贡献了一种新颖的数据集构建方法，用于生成与人类对话场景和人格特质相匹配的LLM争议解决对话数据。最后，我们使用三种当代闭源LLM演示了该评估框架的应用，结果显示不同LLM在冲突情境中的人格表现与人类数据存在显著差异，这对“人格提示智能体能在具有社会影响的应用中作为可靠行为代理”的假设提出了挑战。我们的工作强调了AI模拟在现实应用前需要进行心理学基础验证的必要性。

相关内容

关注 7109

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

专知会员服务

19+阅读 · 2025年12月27日

从语言到行动：大语言模型作为自主智能体与工具使用者的综述

专知会员服务

29+阅读 · 2025年9月2日

可解释人工智能中的大语言模型：全面综述

专知会员服务

53+阅读 · 2025年4月2日

《以人为中心的大型语言模型（LLM）研究综述》

专知会员服务

41+阅读 · 2024年11月25日