揭示大型语言模型中的细粒度价值观与观点 (Revealing Fine-Grained Values and Opinions in Large Language Models)

Uncovering latent values and opinions embedded in large language models (LLMs) can help identify biases and mitigate potential harm. Recently, this has been approached by prompting LLMs with survey questions and quantifying the stances in the outputs towards morally and politically charged statements. However, the stances generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to argue for or against a given position. In this work, we propose to address this by analysing a large and robust dataset of 156k LLM responses to the 62 propositions of the Political Compass Test (PCT) generated by 6 LLMs using 420 prompt variations. We perform coarse-grained analysis of their generated stances and fine-grained analysis of the plain text justifications for those stances. For fine-grained analysis, we propose to identify tropes in the responses: semantically similar phrases that are recurrent and consistent across different prompts, revealing natural patterns in the text that a given LLM is prone to produce. We find that demographic features added to prompts significantly affect outcomes on the PCT, reflecting bias, as well as disparities between the results of tests when eliciting closed-form vs. open domain responses. Additionally, patterns in the plain text rationales via tropes show that similar justifications are repeatedly generated across models and prompts even with disparate stances.

翻译：揭示大型语言模型（LLMs）中隐含的价值观与观点有助于识别偏见并减轻潜在危害。近期研究通过向LLMs输入调查问题，并量化其输出中对道德与政治敏感声明的立场来实现这一目标。然而，LLMs生成的立场可能因提示方式不同而产生显著差异，且支持或反对特定立场存在多种论证方式。本研究通过分析一个大规模鲁棒数据集来解决该问题：该数据集包含6个LLM使用420种提示变体对政治指南针测试（PCT）62个命题生成的15.6万条响应。我们对其生成立场进行粗粒度分析，并对立场背后的纯文本论证进行细粒度分析。在细粒度分析中，我们提出识别响应中的"惯用表达"：即在不同提示下反复出现且保持一致的语义相似短语，这些短语揭示了特定LLM倾向于生成的文本自然模式。研究发现，提示中添加的人口统计学特征会显著影响PCT测试结果，这既反映了模型偏见，也揭示了封闭形式响应与开放域响应测试结果之间的差异。此外，通过惯用表达对纯文本论证模式的分析表明，即使立场迥异，相似论证理由仍会在不同模型和提示中反复生成。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日