Large Language Models (LLMs) behave non-deterministically, and prompting has become a common method for steering their outputs. A popular strategy is to assign a persona to the model to produce more varied, context-sensitive responses, similar to how responses vary across human individuals. Against the expectation that persona prompting yields a wide range of opinions, our experiments show that LLMs keep consistent value orientations. We observe a persistent inertia in their responses, where certain moral and value dimensions (especially harm avoidance and fairness) stay skewed in one direction across persona settings. To study this, we use role-play at scale, which pairs randomized persona prompts with a macro-level analysis of model outputs. Our results point to strong internal biases and value preferences in LLMs, which we call value orientation and inertia. These models warrant scrutiny and adjustment before use in applications where balanced outputs matter.
翻译:大语言模型(LLMs)的行为具有非确定性,提示已成为引导其输出的常用方法。一种常见策略是为模型分配角色,使其生成更具多样性、更贴合语境的回答,类似于不同个体间回答的差异性。与角色提示能产生广泛观点多样性的预期相反,我们的实验表明,LLMs始终保持着一致的价值取向。我们观察到其回答存在持久惯性——特定的道德与价值维度(尤其是伤害回避与公平性)在不同角色设定下始终偏向同一方向。为探究此现象,我们大规模运用角色扮演方法,将随机化角色提示与模型输出宏观分析相结合。研究结果表明,LLMs存在强烈的内在偏见与价值偏好,我们将其定义为价值取向与惯性。在需要输出平衡性的应用场景中投入使用前,这些模型需经过严格审查与调整。