Standardized and quantified evaluation of machine behaviors is a crux of understanding LLMs. In this study, we draw inspiration from psychometric studies by leveraging human personality theory as a tool for studying machine behaviors. Originating as a philosophical quest for human behaviors, the study of personality delves into how individuals differ in thinking, feeling, and behaving. Toward building and understanding human-like social machines, we are motivated to ask: Can we assess machine behaviors by leveraging human psychometric tests in a principled and quantitative manner? If so, can we induce a specific personality in LLMs? To answer these questions, we introduce the Machine Personality Inventory (MPI) tool for studying machine behaviors; MPI follows standardized personality tests, built upon the Big Five Personality Factors (Big Five) theory and personality assessment inventories. By systematically evaluating LLMs with MPI, we provide the first piece of evidence demonstrating the efficacy of MPI in studying LLMs behaviors. We further devise a Personality Prompting (P^2) method to induce LLMs with specific personalities in a controllable way, capable of producing diverse and verifiable behaviors. We hope this work sheds light on future studies by adopting personality as the essential indicator for various downstream tasks, and could further motivate research into equally intriguing human-like machine behaviors.
翻译:标准化的量化评估机器行为是理解大语言模型的关键。本研究借鉴心理测量学方法,利用人类人格理论作为研究机器行为的工具。人格研究源于对人类行为的哲学探索,深入探究个体在思维、情感和行为方面的差异。为构建和理解类人社会机器,我们提出以下问题:能否通过系统化、量化的方式,利用人类心理测试评估机器行为?如果可以,能否诱导大语言模型形成特定人格?为回答这些问题,我们引入了机器人格量表工具用于研究机器行为;该工具遵循标准化人格测试,基于大五人格理论及人格评估量表构建。通过系统性地评估大语言模型,我们首次证明了机器人格量表在研究大语言模型行为方面的有效性。进一步,我们设计了一种人格提示方法,可受控地诱导大语言模型形成特定人格,从而产生多样化且可验证的行为。我们期望本研究通过将人格作为各类下游任务的关键指标,为未来研究提供启示,并推动对同样引人入胜的类人机器行为的深入探索。