Evaluating Large Language Model Biases in Persona-Steered Generation

The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.

翻译：角色引导文本生成任务要求大型语言模型生成能够反映符合特定角色的个体可能持有的观点分布的文本。人物角色具有多面性，但先前关于LLM生成观点偏见的研究仅探讨了多项选择场景或一维角色。我们将"不一致角色"定义为具有多个特征、且其中某一特征使其其他特征在人类调查数据中出现概率降低的角色，例如支持增加军费开支的政治自由派。研究发现，LLM对不一致角色的引导性比对一致角色低9.7%，有时会生成与其人口统计特征相关的刻板立场而非目标立场。通过人类反馈强化学习微调的模型展现出更高的引导性，特别是对政治自由派和女性相关立场的引导，但呈现的角色观点多样性显著降低。我们还发现LLM引导性的变化无法通过多项选择观点评估来预测。研究结果表明，在开放式文本生成中评估模型至关重要，因为这能够揭示新的LLM观点偏见。此外，这种评估框架能够阐明我们将模型引导至更丰富多元观点范围的能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日