Personality testing of Large Language Models: Limited temporal stability, but highlighted prosociality

As Large Language Models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks, and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLM's, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs personality profile was analyzed and compared to human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs responses over a short time, with some LLMs showing higher agreement (e.g., LIama3 and GPT-4o) compared to others (e.g., GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.

翻译：随着大型语言模型（LLMs）因其类人特质及为用户提供的亲密感而持续受到欢迎，其社会影响不可避免地扩大。这导致全面研究的需求日益增长，以充分理解LLMs并揭示其潜在机遇、缺陷及整体社会影响。鉴于此，本研究对七种LLMs进行了广泛调查，旨在评估其在两个时间点上对人格测量工具回应的时态稳定性及评分者间一致性。此外，我们分析了LLMs的人格特征剖面，并与人类常模数据进行了比较。研究结果显示，在短时间内，不同LLMs的回应呈现出不同程度的评分者间一致性：部分模型（如LIama3和GPT-4o）表现出较高的一致性，而其他模型（如GPT-4和Gemini）则相对较低。此外，一致性程度受所用测量工具及具体领域或特质的影响。这表明LLMs可靠模拟稳定人格特征的能力存在差异性。在至少达到中等一致性水平的量表中，LLMs在能动性与共生性领域均主要呈现出社会期望型特征剖面，同时表现出亲社会人格剖面，具体体现为较高的宜人性与尽责性以及较低的马基雅维利主义倾向。由于人工智能系统的社会影响及AI安全性考量，在人格特质上展现时态稳定性与连贯回应能力至关重要。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日