Despite the many use cases for large language models (LLMs) in the design of chatbots in various industries and the research showing the importance of personalizing chatbots to cater to different personality traits, little work has been done to evaluate whether the behaviors of personalized LLMs can reflect certain personality traits accurately and consistently. We consider studying the behavior of LLM-based simulated agents which refer to as LLM personas and present a case study with GPT-3.5 (text-davinci-003) to investigate whether LLMs can generate content with consistent, personalized traits when assigned Big Five personality types and gender roles. We created 320 LLM personas (5 females and 5 males for each of the 32 Big Five personality types) and prompted them to complete the classic 44-item Big Five Inventory (BFI) and then write an 800-word story about their childhood. Results showed that LLM personas' self-reported BFI scores are consistent with their assigned personality types, with large effect sizes found on all five traits. Moreover, significant correlations were found between assigned personality types and some Linguistic Inquiry and Word Count (LIWC) psycholinguistic features of their writings. For instance, extroversion is associated with pro-social and active words, and neuroticism is associated with words related to negative emotions and mental health. Besides, we only found significant differences in using technological and cultural words in writing between LLM-generated female and male personas. This work provides a first step for further research on personalized LLMs and their applications in Human-AI conversation.
翻译:尽管大语言模型(LLMs)在各类行业聊天机器人设计中有着诸多应用场景,且已有研究表明个性化聊天机器人以适应不同人格特质的重要性,但关于个性化LLM的行为能否准确且持续地反映特定人格特质的研究仍较为匮乏。本研究聚焦于基于LLM的模拟代理(即LLM人格体)的行为特征,并以GPT-3.5(text-davinci-003)为案例,探究当赋予大五人格类型和性别角色时,LLM能否生成具有一致且个性化特质的内容。我们构建了320个LLM人格体(针对32种大五人格类型各设置5个女性和5个男性人格体),引导其完成经典44项大五人格量表(BFI),并撰写一篇800词关于其童年经历的故事。结果显示,LLM人格体自我报告的BFI得分与其被分配的人格类型具有一致性,且在全部五项特质上均呈现大效应量。此外,被分配的人格类型与文本中某些语言查询与词频统计(LIWC)心理语言学特征之间存在显著相关性——例如,外向性与亲社会性及主动性词汇相关,神经质则与消极情绪和心理健康相关词汇相关。我们仅在女性与男性LLM人格体生成的文本中发现科技类与文化类词汇使用上存在显著差异。本研究为个性化LLM及其在人机对话中的应用提供了初步探索。