Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. However, with this comes numerous ethical concerns, particularly related to the lack of transparency in these "black-box" models. Although previous studies have sought to increase the transparency of these models by investigating the personality traits of LLMs, many of the previous studies have provided them with personality assessments to complete. On the other hand, this study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts. Specifically, we use a novel elicitation approach using prompts derived from common interview questions, as well as prompts designed to elicit particular Big Five personality traits to examine whether the models were susceptible to trait-activation like humans are, to measure their personality based on the language used in their outputs. To do so, we repeatedly prompted multiple LMs with different parameter sizes, including Llama-2, Falcon, Mistral, Bloom, GPT, OPT, and XLNet (base and fine tuned versions) and examined their personality using classifiers trained on the myPersonality dataset. Our results reveal that, generally, all LLMs demonstrate high openness and low extraversion. However, whereas LMs with fewer parameters exhibit similar behaviour in personality traits, newer and LMs with more parameters exhibit a broader range of personality traits, with increased agreeableness, emotional stability, and openness. Furthermore, a greater number of parameters is positively associated with openness and conscientiousness. Moreover, fine-tuned models exhibit minor modulations in their personality traits, contingent on the dataset. Implications and directions for future research are discussed.
翻译:大型语言模型(LLMs)在招聘场景中日益被求职者和雇主使用。然而,这伴随着诸多伦理问题,尤其是这些"黑箱"模型缺乏透明度。尽管先前研究试图通过探究LLMs的人格特质来提升模型透明度,但多数研究仅向模型提供人格评估量表让其完成。本研究则另辟蹊径,通过分析不同输入提示引发的输出差异来深化对这类模型的理解。具体而言,我们采用基于常见面试问题衍生的提示词,以及为激活特定大五人格特质而设计的提示词,构建新颖的激发性方法,考察模型是否像人类一样易受特质激活效应影响,并通过其输出语言特征测量人格。为此,我们多次向不同参数规模的多种语言模型(包括Llama-2、Falcon、Mistral、Bloom、GPT、OPT、XLNet及其基础与微调版本)进行提示,并利用在myPersonality数据集上训练的分类器评估其人格。结果显示:总体上所有LLMs均表现出高开放性、低外向性;虽然小参数模型在人格特质上表现趋同,但新型与大参数模型展现出更广泛的人格特质范围,尤其在宜人性、情绪稳定性和开放性维度有所增强。此外,参数数量与开放性、尽责性呈正相关;而微调模型的人格特质会因训练数据集产生细微调节。本文讨论了相关启示与未来研究方向。