In this work, we tackle the challenge of embedding realistic human personality traits into LLMs. Previous approaches have primarily focused on prompt-based methods that describe the behavior associated with the desired personality traits, suffering from realism and validity issues. To address these limitations, we introduce BIG5-CHAT, a large-scale dataset containing 100,000 dialogues designed to ground models in how humans express their personality in language. Leveraging this dataset, we explore Supervised Fine-Tuning and Direct Preference Optimization as training-based methods to align LLMs more naturally with human personality patterns. Our methods outperform prompting on personality assessments such as BFI and IPIP-NEO, with trait correlations more closely matching human data. Furthermore, our experiments reveal that models trained to exhibit higher conscientiousness, higher agreeableness, lower extraversion, and lower neuroticism display better performance on reasoning tasks, aligning with psychological findings on how these traits impact human cognitive performance. To our knowledge, this work is the first comprehensive study to demonstrate how training-based methods can shape LLM personalities through learning from real human behaviors.
翻译:本研究致力于解决将真实人类人格特质嵌入大型语言模型(LLM)的挑战。以往方法主要集中于基于提示的方法,通过描述期望人格特质对应的行为来实现,但存在真实性与效度问题。为克服这些局限,我们提出了BIG5-CHAT——一个包含10万轮对话的大规模数据集,旨在将模型建立在人类如何通过语言表达人格的基础上。利用该数据集,我们探索了监督微调(SFT)与直接偏好优化(DPO)两种基于训练的方法,以使LLM更自然地与人类人格模式对齐。在BFI和IPIP-NEO等人格评估中,我们的方法优于提示方法,其特质相关性更接近人类数据。此外,实验表明,经过训练表现出更高尽责性、更高宜人性、更低外向性与更低神经质性的模型,在推理任务上表现更优,这与心理学关于这些特质如何影响人类认知表现的发现相一致。据我们所知,本研究首次系统论证了基于训练的方法如何通过学习真实人类行为来塑造LLM人格。