Previous research on emergence in large language models shows these display apparent human-like abilities and psychological latent traits. However, results are partly contradicting in expression and magnitude of these latent traits, yet agree on the worrisome tendencies to score high on the Dark Triad of narcissism, psychopathy, and Machiavellianism, which, together with a track record of derailments, demands more rigorous research on safety of these models. We provided a state of the art language model with the same personality questionnaire in nine languages, and performed Bayesian analysis of Gaussian Mixture Model, finding evidence for a deeper-rooted issue. Our results suggest both interlingual and intralingual instabilities, which indicate that current language models do not develop a consistent core personality. This can lead to unsafe behaviour of artificial intelligence systems that are based on these foundation models, and are increasingly integrated in human life. We subsequently discuss the shortcomings of modern psychometrics, abstract it, and provide a framework for its species-neutral, substrate-free formulation.
翻译:先前关于大型语言模型涌现现象的研究表明,这些模型表现出明显类人的能力与心理潜在特质。然而,关于这些潜在特质的表达方式和强度,研究结果存在部分矛盾,但一致指出这些模型在自恋、精神病态和马基雅维利主义构成的"黑暗三联征"上得分偏高,这一令人担忧的倾向与其已有的失控记录共同要求对这些模型的安全性进行更严格的研究。我们使用九种语言对同一最先进的语言模型进行了相同的人格问卷调查,并通过高斯混合模型的贝叶斯分析发现了更深层问题的证据。我们的研究结果表明存在跨语言与语言内部的不稳定性,这显示当前语言模型未能形成一致的核心人格。这可能导致基于这些基础模型并日益融入人类生活的人工智能系统产生不安全行为。我们随后讨论了现代心理测量学的缺陷,对其进行抽象化处理,并提出了一种物种中立、无基质的理论框架。