This study investigates the capacity of Large Language Models (LLMs) to infer the Big Five personality traits from free-form user interactions. The results demonstrate that a chatbot powered by GPT-4 can infer personality with moderate accuracy, outperforming previous approaches drawing inferences from static text content. The accuracy of inferences varied across different conversational settings. Performance was highest when the chatbot was prompted to elicit personality-relevant information from users (mean r=.443, range=[.245, .640]), followed by a condition placing greater emphasis on naturalistic interaction (mean r=.218, range=[.066, .373]). Notably, the direct focus on personality assessment did not result in a less positive user experience, with participants reporting the interactions to be equally natural, pleasant, engaging, and humanlike across both conditions. A chatbot mimicking ChatGPT's default behavior of acting as a helpful assistant led to markedly inferior personality inferences and lower user experience ratings but still captured psychologically meaningful information for some of the personality traits (mean r=.117, range=[-.004, .209]). Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups. Our results highlight the potential of LLMs for psychological profiling based on conversational interactions. We discuss practical implications and ethical challenges associated with these findings.
翻译:本研究探讨了大型语言模型从自由形式的用户交互中推断大五人格特质的能力。结果表明,基于GPT-4的聊天机器人能够以中等准确度推断人格特质,其表现优于以往基于静态文本内容进行推断的方法。推断准确度在不同对话情境中存在差异:当聊天机器人被提示引导用户提供与人格相关的信息时,其表现最佳(平均r=.443,范围=[.245, .640]);其次是在更强调自然交互的情境中(平均r=.218,范围=[.066, .373])。值得注意的是,直接聚焦于人格评估并未导致用户体验下降,参与者在两种情境下均报告交互具有同等的自然性、愉悦感、吸引力和拟人性。模拟ChatGPT默认助手机制的聊天机器人的人格推断能力显著较差,用户体验评分也较低,但仍能针对部分人格特质捕获具有心理学意义的信息(平均r=.117,范围=[-.004, .209])。初步分析表明,人格推断准确度在不同社会人口学子群体间仅存在微小差异。我们的研究结果凸显了大型语言模型基于对话交互进行心理画像的潜力。文中讨论了相关实际应用与伦理挑战。