Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. However, with this comes numerous ethical concerns, particularly related to the lack of transparency in these "black-box" models. Although previous studies have sought to increase the transparency of these models by investigating the personality traits of LLMs, many of the previous studies have provided them with personality assessments to complete. On the other hand, this study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts. Specifically, we use a novel elicitation approach using prompts derived from common interview questions, as well as prompts designed to elicit particular Big Five personality traits to examine whether the models were susceptible to trait-activation like humans are, to measure their personality based on the language used in their outputs. To do so, we repeatedly prompted multiple LMs with different parameter sizes, including Llama-2, Falcon, Mistral, Bloom, GPT, OPT, and XLNet (base and fine tuned versions) and examined their personality using classifiers trained on the myPersonality dataset. Our results reveal that, generally, all LLMs demonstrate high openness and low extraversion. However, whereas LMs with fewer parameters exhibit similar behaviour in personality traits, newer and LMs with more parameters exhibit a broader range of personality traits, with increased agreeableness, emotional stability, and openness. Furthermore, a greater number of parameters is positively associated with openness and conscientiousness. Moreover, fine-tuned models exhibit minor modulations in their personality traits, contingent on the dataset. Implications and directions for future research are discussed.
翻译:大型语言模型(LLMs)在招聘场景中正被求职者和雇主日益广泛使用。然而,这伴随着诸多伦理问题,尤其与这些"黑箱"模型缺乏透明度相关。尽管先前研究试图通过探究LLMs的个性特征来提升模型透明度,但多数研究采用完成个性评估问卷的方式。与此不同,本研究通过考察模型基于不同输入提示的输出差异,旨在更深入理解此类模型。具体而言,我们采用一种新型激发方法,使用源自常见面试问题的提示以及旨在激发大五人格特定特征的提示,检验模型是否像人类一样易受特质激活影响,并通过输出语言内容测量其个性特征。为此,我们反复对多个不同参数规模的LM(包括Llama-2、Falcon、Mistral、Bloom、GPT、OPT及XLNet的基础与微调版本)进行提示,并使用基于myPersonality数据集训练的分类器分析其个性特征。研究结果表明:总体而言,所有LLMs均表现出高开放性、低外向性特征。然而,参数较少的LM在个性特征方面呈现相似行为,而参数更多的新一代LM展现出更广泛的个性特征谱系,表现为宜人性、情绪稳定性和开放性的提升。此外,参数数量与开放性及尽责性呈正相关。值得注意的是,微调模型在个性特征上仅呈现取决于数据集的细微调节。最后讨论研究启示及未来方向。