Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron-based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PersonalityBench, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and is designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that manipulates the values of these identified personality-related neurons. This method enables fine-grained control over the traits exhibited by LLMs without training and modifying model parameters. Extensive experiments validate the efficacy of our neuron identification and trait induction methods. Notably, our approach achieves comparable performance as fine-tuned models, offering a more efficient and flexible solution for personality trait induction in LLMs. We provide access to all the mentioned resources at https://github.com/RUCAIBox/NPTI.
翻译:大型语言模型(LLM)在模拟各类人格特质方面展现出日益增强的能力,这对支持角色扮演等相关应用具有重要价值。为进一步提升该能力,本文提出了一种基于神经元的人格特质诱导方法,包含三项主要技术贡献。首先,我们构建了PersonalityBench——一个用于识别和评估LLM人格特质的大规模数据集。该数据集以心理学中的大五人格特质为理论基础,专门用于评估LLM面向特定人格特质的生成能力。其次,借助PersonalityBench,我们提出了一种通过考察特定特质的对立面来高效识别LLM内人格相关神经元的方法。第三,我们开发了一种简单而有效的诱导方法,通过操纵这些已识别的人格相关神经元的数值来实现。该方法无需训练或修改模型参数,即可对LLM展现的特质进行精细调控。大量实验验证了我们的神经元识别与特质诱导方法的有效性。值得注意的是,我们的方法达到了与微调模型相当的性能,为LLM的人格特质诱导提供了更高效、更灵活的方案。相关资源已开放于https://github.com/RUCAIBox/NPTI。