As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value-codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and sub-group diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.
翻译:随着大语言模型(LLM)在全球范围内部署,对齐其文化价值取向对于安全性和用户参与度至关重要。然而,现有基准面临构建-组成-语境($C^3$)挑战:依赖辨别性、多项选择格式来探查价值知识而非真实取向,忽视亚文化异质性,且与现实世界的开放式生成不匹配。我们提出DOVE,一种直接比较人类撰写的文本分布与LLM生成输出的分布评估框架。DOVE利用率失真变分优化目标,从10K文档中构建紧凑价值码本,将文本映射到结构化价值空间以过滤语义噪声。采用非平衡最优传输测量对齐效果,捕捉文化内部分布结构与子群多样性。在12个LLM上的实验表明,DOVE实现了卓越的预测有效性,与下游任务的相关性达到31.56%,同时在每种文化仅需500个样本的情况下保持高可靠性。