Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this paper presents a framework for investigating psychology in LLMs, including psychological dimension identification, assessment dataset curation, and assessment with results validation. Following this framework, we introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence. This benchmark includes thirteen datasets featuring diverse scenarios and item types. Our findings indicate that LLMs manifest a broad spectrum of psychological attributes. We also uncover discrepancies between LLMs' self-reported traits and their behaviors in real-world scenarios. This paper demonstrates a thorough psychometric assessment of LLMs, providing insights into reliable evaluation and potential applications in AI and social sciences.
翻译:大型语言模型(LLMs)已展现出卓越的任务解决能力,日益承担起类人助手的角色。随着LLMs在社会中的广泛整合,学界开始关注其是否表现出心理属性,以及这些属性是否稳定——这些研究有助于深化对其行为的理解。受心理测量学启发,本文提出了一个研究LLMs心理学的框架,涵盖心理维度识别、评估数据集构建、评估实施与结果验证。基于该框架,我们构建了一个覆盖六项心理维度的综合性LLMs心理测量基准:人格、价值观、情绪、心理理论、动机与智力。该基准包含十三个数据集,涵盖多样化场景与项目类型。研究发现表明,LLMs呈现出广泛的心理属性谱系。同时,我们发现了LLMs自我报告特质与其在现实场景中行为表现之间的差异。本文系统展示了LLMs的心理测量学评估方法,为可靠评估及其在人工智能与社会科学中的潜在应用提供了新的见解。