High-Dimension Human Value Representation in Large Language Models

The widespread application of Large Language Models (LLMs) across various tasks and fields has necessitated the alignment of these models with human values and preferences. Given various approaches of human value alignment, ranging from Reinforcement Learning with Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need to understand the scope and nature of human values injected into these models before their release. There is also a need for model alignment without a costly large scale human annotation effort. We propose UniVaR, a high-dimensional representation of human value distributions in LLMs, orthogonal to model architecture and training data. Trained from the value-relevant output of eight multilingual LLMs and tested on the output from four multilingual LLMs, namely LlaMA2, ChatGPT, JAIS and Yi, we show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources. Through UniVaR, we explore how different LLMs prioritize various values in different languages and cultures, shedding light on the complex interplay between human values and language modeling.

翻译：大型语言模型（LLMs）在各类任务和领域的广泛应用，使得将这些模型与人类价值观和偏好对齐变得至关重要。鉴于人类价值对齐方法的多样性，从基于人类反馈的强化学习（RLHF）到宪法学习等，在模型发布前理解注入这些模型的人类价值观的范围和性质已迫在眉睫。同时，也需要一种无需昂贵大规模人工标注工作的模型对齐方法。我们提出了UniVaR，一种与模型架构和训练数据正交的、用于表示LLMs中人类价值分布的高维表征方法。该方法基于八个多语言LLMs的价值相关输出进行训练，并在四个多语言LLMs（即LlaMA2、ChatGPT、JAIS和Yi）的输出上进行测试。结果表明，UniVaR是一个强大的工具，可用于比较嵌入不同LLMs、源自不同语言的人类价值分布。通过UniVaR，我们探究了不同LLMs如何在不同语言和文化中优先考虑各种价值观，从而揭示了人类价值观与语言建模之间复杂的相互作用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日