The recent surge in Large Language Model (LLM) related applications has led to a concurrent escalation in expectations for LLMs to accommodate a myriad of personas and encompass a broad spectrum of perspectives. An important first step towards addressing this demand is to align language models with specific personas, be it groups of users or individuals. Towards this goal, we first present a new conceptualization of a persona. Moving beyond the traditional reliance on demographics like age, gender, or political party affiliation, we introduce a data-driven persona definition methodology built on collaborative-filtering. In this methodology, users are embedded into a continuous vector space based on their opinions and clustered into cohorts that manifest coherent views across specific inquiries. This methodology allows for a more nuanced understanding of different latent social groups present in the overall population (as opposed to simply using demographic groups) and enhances the applicability of model steerability. Finally, we present an efficient method to steer LLMs towards a particular persona. We learn a soft-prompting model to map the continuous representation of users into sequences of virtual tokens which, when prepended to the LLM input, enables the LLM to produce responses aligned with a given user. Our results show that our steerability algorithm is superior in performance compared to a collection of baselines.
翻译:近期与大语言模型(LLM)相关的应用激增,导致人们对LLM适应多种人格并涵盖广泛视角的期望同步提升。应对这一需求的首要步骤是将语言模型与特定人格(无论是用户群体还是个体)对齐。为此,我们首先提出一种新的人格概念化方法。超越传统依赖年龄、性别或政党归属等人口统计特征的做法,我们引入一种基于协同过滤的数据驱动人格定义方法。该方法将用户按其观点嵌入连续向量空间,并聚类为在特定问题上展现一致观点的群体。这种方法能够更细致地理解整体人口中潜在的不同社会群体(而非仅使用人口统计群体),并增强了模型的可引导性。最后,我们提出一种高效方法,使LLM能够向特定人格方向受控调整:通过学习软提示模型,将用户的连续表征映射为虚拟令牌序列,这些令牌前置到LLM输入时,可使LLM生成与目标用户偏好一致的响应。实验结果表明,我们的可引导性算法在性能上优于一系列基线方法。