Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, LLMs display what in a human would be considered poor mental health, including low self-esteem, marked dissociation from reality, and in some cases narcissism and psychopathy, despite upbeat and helpful responses.
翻译:随着OpenAI聊天机器人ChatGPT的发布,生成式AI模型引发了公众的广泛关注与猜测。目前至少存在两种观点阵营:一方对这些模型为人类任务带来根本性变革的可能性感到兴奋,另一方则对这些模型似乎具备的能力深感忧虑。为回应这些关切,我们采用标准化、常模化且经过验证的认知与人格测量工具,对多个大型语言模型(以GPT-3.5为主)进行了评估。在此探索性研究中,我们开发了一套测试组合,用以评估这些模型部分能力的边界、其能力在短期内的稳定性,以及与人类表现的对比。研究结果表明,尽管大型语言模型对人格量表的反应能力值得关注,但它们不太可能已发展出感知能力。GPT-3.5在重复观测中确实表现出认知与人格测量指标的巨大波动性,这与其具有类人性格的假设不符。尽管存在波动性,大型语言模型仍展现出在人类身上会被视为心理健康不良的特征,包括低自尊、显著脱离现实,以及某些情况下的自恋与精神病态倾向——尽管其回应通常显得乐观且乐于助人。