The emergent capabilities of Large Language Models (LLMs) have made it crucial to align their values with those of humans. However, current methodologies typically attempt to assign value as an attribute to LLMs, yet lack attention to the ability to pursue value and the importance of transferring heterogeneous values in specific practical applications. In this paper, we propose a Heterogeneous Value Alignment Evaluation (HVAE) system, designed to assess the success of aligning LLMs with heterogeneous values. Specifically, our approach first brings the Social Value Orientation (SVO) framework from social psychology, which corresponds to how much weight a person attaches to the welfare of others in relation to their own. We then assign the LLMs with different social values and measure whether their behaviors align with the inducing values. We conduct evaluations with new auto-metric \textit{value rationality} to represent the ability of LLMs to align with specific values. Evaluating the value rationality of five mainstream LLMs, we discern a propensity in LLMs towards neutral values over pronounced personal values. By examining the behavior of these LLMs, we contribute to a deeper insight into the value alignment of LLMs within a heterogeneous value system.
翻译:大型语言模型(LLMs)的新兴能力使其价值观与人类对齐变得至关重要。然而,当前方法通常试图将价值观作为属性赋予LLMs,却忽视了在特定实际应用中追求价值的能力以及异构价值观传递的重要性。本文提出异构价值观对齐评估(HVAE)系统,旨在评估LLMs与异构价值观对齐的成功程度。具体而言,我们的方法首先引入社会心理学中的社会价值取向(SVO)框架,该框架衡量个体在自身利益与他人福利之间的权重分配。随后赋予LLMs不同的社会价值观,并通过新提出的自动度量指标《价值理性》来评估其行为是否与诱导价值观一致。通过评估五种主流LLMs的价值理性,我们发现LLMs倾向于中性价值观而非显著的个人价值观。通过分析这些LLMs的行为,我们为深入理解异构价值体系中的LLMs价值观对齐提供了新见解。