Recent advancements in Large Language Models (LLMs) have heightened concerns about their potential misalignment with human values. However, evaluating their grasp of these values is complex due to their intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present the Value Understanding Measurement (VUM) framework that quantitatively assess both "know what" and "know why" by measuring the discriminator-critique gap related to human values. Using the Schwartz Value Survey, we specify our evaluation values and develop a thousand-level dialogue dataset with GPT-4. Our assessment looks at both the value alignment of LLM's outputs compared to baseline answers and how LLM responses align with reasons for value recognition versus GPT-4's annotations. We evaluate five representative LLMs and provide strong evidence that the scaling law significantly impacts "know what" but not much on "know why", which has consistently maintained a high level. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.
翻译:近年来,大型语言模型(LLMs)的进步加剧了人们对它们可能与人类价值观不一致的担忧。然而,由于这些模型复杂且适应性强的特性,评估它们对这些价值观的理解程度十分困难。我们认为,真正评估LLMs对价值观的理解需要同时考虑“知其然”与“知其所以然”。为此,我们提出了价值理解测量(VUM)框架,该框架通过衡量与人类价值观相关的判别器-批判差异,定量评估了“知其然”与“知其所以然”。利用施瓦茨价值观调查,我们明确了评估所采用的价值观,并借助GPT-4开发了一个包含数千级对话的数据集。我们的评估同时考察了LLM输出结果相较于基线答案的价值观对齐程度,以及LLM响应中价值观识别的原因与GPT-4标注的一致性。我们评估了五个具有代表性的LLMs,并提供了有力证据表明:规模定律显著影响了“知其然”,但对一直保持高水平的“知其所以然”影响不大。这进一步表明,LLMs可能基于给定情境构建看似合理的解释,而并未真正理解其内在价值观,从而揭示了潜在风险。