Recent advancements in Large Language Models (LLMs) have heightened concerns about their potential misalignment with human values. However, evaluating their grasp of these values is complex due to their intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present the Value Understanding Measurement (VUM) framework that quantitatively assesses both "know what" and "know why" by measuring the discriminator-critique gap related to human values. Using the Schwartz Value Survey, we specify our evaluation values and develop a thousand-level dialogue dataset with GPT-4. Our assessment looks at both the value alignment of LLM's outputs compared to baseline answers and how LLM responses align with reasons for value recognition versus GPT-4's annotations. We evaluate five representative LLMs and provide strong evidence that the scaling law significantly impacts "know what" but not much on "know why", which has consistently maintained a high level. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.
翻译:近期大型语言模型(LLMs)的进展加剧了对其与人类价值观可能错位的担忧。然而,由于价值观复杂且具有适应性,评估LLMs对这些价值的理解程度颇具挑战性。我们认为,真正理解LLMs中的价值观需要兼顾“知其然”与“知其所以然”。为此,我们提出价值理解测量(VUM)框架,通过量化与人类价值观相关的判别器-批判差距来同时评估“知其然”与“知其所以然”。基于施瓦茨价值观调查,我们明确了评估价值观,并利用GPT-4构建了千级对话数据集。我们的评估既考察LLM输出与基线答案的价值对齐程度,也分析LLM响应与价值识别理由(相较于GPT-4标注)的匹配情况。我们评估了五种代表性LLM,并有力证明了规模定律显著影响“知其然”,但对“知其所以然”影响甚微——后者始终维持在高水平。这进一步表明,LLMs可能根据给定语境编造看似合理的解释,却并未真正理解其内在价值,暗示潜在风险。