Quantifying the Salience of Geo-Cultural Values for Pluralistic Safety Alignment

Safe global deployment of AI models requires alignment with human values that vary across cultures. Yet rater pools in safety evaluation datasets remain largely geographically homogeneous, failing to capture geo-cultural differences. Further, it remains unclear whether such differences persist after controlling for demographics such as age, gender, and ethnicity. Through a meta-analysis of safety datasets, we find that most do not report geo-cultural information, and those that do lack a unified methodology to jointly analyze geo-cultural and demographic correlates. Using the Inglehart-Welzel dimensions of cross-cultural variation, we demonstrate via multilevel modeling that cultural zone membership explains variance in safety ratings beyond standard demographics (p<0.05 across 6 datasets). Moreover, our analysis indicates that roughly 10% of items in the datasets we examined are culturally sensitive: likely to be misclassified as safe without adequate cultural representation. We evaluate LLMs as both rater surrogates and triage tools, finding that current LLMs do not reliably stand in for raters, though they can help prioritize culturally sensitive items for human annotation. Our findings motivate more culturally pluralistic safety evaluation and offer practical takeaways to support it.

翻译：AI模型的安全全球部署需要与因文化而异的人类价值观对齐。然而，安全评估数据集中的评分者池在地理分布上仍高度同质，未能捕捉地理文化差异。此外，在控制年龄、性别和种族等人口统计变量后，这些差异是否依然存在尚不明确。通过对安全数据集的元分析，我们发现大多数数据集未报告地理文化信息，而已报告的数据集也缺乏统一方法论来联合分析地理文化与人口统计相关性。采用英格尔哈特-韦尔策尔跨文化变异维度，我们通过多层模型证明，文化区域归属能解释超越标准人口统计变量的安全评分变异（p<0.05，涵盖6个数据集）。此外，我们的分析表明，所检查数据集中约10%的项目具有文化敏感性：若缺乏充分的文化代表性，这些项目可能被误分类为安全。我们将大语言模型评估为评分者替代工具和分流工具，发现当前大语言模型虽能辅助优先筛选文化敏感项目供人工标注，但无法可靠替代评分者。我们的研究结果推动了更具文化多元性的安全评估实践，并为此提供了实用启示。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《人工智能增强监视分析：利用跨网络、陆地、空中及海上领域的威胁向量实时建模》

专知会员服务

29+阅读 · 2025年12月11日

认知优势：人工智能在国家安全决策中的核心作用

专知会员服务

16+阅读 · 2025年8月16日

《军事领域的人工智能治理：多方利益相关者对优先领域的看法》最新报告

专知会员服务

25+阅读 · 2024年9月25日

【牛津大学博士论文】学习分布不确定性估计的语义分割，191页pdf

专知会员服务

30+阅读 · 2024年7月31日