While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 languages, systematically organized into 7 core domains grounded in Schwartz's Theory of Basic Human Values and categorized into easy and hard levels for discriminative evaluation. We further propose a unique two-stage annotation framework that first identifies whether an issue falls under global consensus (e.g., human rights) or pluralism (e.g., religion), and subsequently conducts a multi-party evaluation of the latent values embedded within the content. Systematic evaluations on X-Value reveal that current SOTA LLMs exhibit deficiencies in cross-lingual values assessment ($Acc < 77\%$), with significant performance disparities across different languages ($ΔAcc > 20\%$). This work highlights the urgent need to improve the nuanced, values-aware content assessment capability of LLMs. Our X-Value is available at: https://huggingface.co/datasets/Whitolf/X-Value.
翻译:尽管大语言模型(LLMs)已成为内容安全的关键技术,当前评估范式主要聚焦于检测显性危害(如暴力或仇恨言论),忽视了数字内容中传递的微妙价值观维度。为弥补这一空白,我们提出了X-Value——一个新颖的跨语言价值观评估基准,旨在从全球视角评估LLMs对内容深层价值观的评判能力。X-Value包含超过5,000个跨18种语言的问答对,基于Schwartz基本人类价值理论系统性地划分为7个核心领域,并分为简单与困难两个层级以进行判别式评估。我们进一步提出独特的两阶段标注框架:首先判定议题属于全球共识(如人权)还是多元主义范畴(如宗教),随后对内容中蕴含的潜在价值观进行多方评估。在X-Value上的系统评估表明,当前最先进的LLMs在跨语言价值观评估方面存在缺陷($Acc < 77\%$),且不同语言间性能差异显著($ΔAcc > 20\%$)。这项工作凸显了提升LLMs对内容进行细致入微、具备价值观感知的评估能力的迫切需求。我们的X-Value数据集发布于:https://huggingface.co/datasets/Whitolf/X-Value。