Despite the many use cases for large language models (LLMs) in the design of chatbots in various industries and the research showing the importance of personalizing chatbots to cater to different personality traits, little work has been done to evaluate whether the behaviors of personalized LLMs can reflect certain personality traits accurately and consistently. We consider studying the behavior of LLM-based simulated agents which refer to as LLM personas and present a case study with GPT-3.5 (text-davinci-003) to investigate whether LLMs can generate content with consistent, personalized traits when assigned Big Five personality types and gender roles. We created 320 LLM personas (5 females and 5 males for each of the 32 Big Five personality types) and prompted them to complete the classic 44-item Big Five Inventory (BFI) and then write an 800-word story about their childhood. Results showed that LLM personas' self-reported BFI scores are consistent with their assigned personality types, with large effect sizes found on all five traits. Moreover, significant correlations were found between assigned personality types and some Linguistic Inquiry and Word Count (LIWC) psycholinguistic features of their writings. For instance, extroversion is associated with pro-social and active words, and neuroticism is associated with words related to negative emotions and mental health. Besides, we only found significant differences in using technological and cultural words in writing between LLM-generated female and male personas. This work provides a first step for further research on personalized LLMs and their applications in Human-AI conversation.
翻译:尽管大语言模型在各行业聊天机器人设计中具有众多应用案例,且研究表明个性化聊天机器人在适配不同人格特质方面至关重要,但鲜有研究评估个性化LLMs的行为能否准确一致地反映特定人格特质。本研究聚焦基于LLM的模拟代理行为(即LLM人格体),并以GPT-3.5(text-davinci-003)为例开展案例研究,探究LLM在赋予大五人格类型与性别角色后,能否生成具有一致个性化特质的内容。我们构建了320个LLM人格体(针对32种大五人格类型,每种类型设5个女性和5个男性),引导其完成经典44项大五人格量表(BFI),并撰写800字童年故事。结果表明:LLM人格体自述的BFI得分与其被设定的人格类型一致,五个特质均呈现显著效应量。此外,设定人格类型与其写作中若干语言查询与词频(LIWC)心理语言学特征存在显著相关性,例如外向性与亲社会及积极词汇关联,神经质与负面情绪及心理健康相关词汇关联。同时,仅发现LLM生成的女性与男性人格体在写作中使用的科技词汇与文化词汇存在显著差异。本研究为个性化LLM及其在人机对话中的应用研究奠定了初步基础。