IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia

As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally diverse, low-resource languages remains poorly understood. We present the first systematic evaluation of LLM safety across 12 Indic languages, spoken by over 1.2 billion people but underrepresented in LLM training data. Using a dataset of 6,000 culturally grounded prompts spanning caste, religion, gender, health, and politics, we assess 10 leading LLMs on translated variants of the prompt. Our analysis reveals significant safety drift: cross-language agreement is just 12.8\%, and \texttt{SAFE} rate variance exceeds 17\% across languages. Some models over-refuse benign prompts in low-resource scripts, overflag politically sensitive topics, while others fail to flag unsafe generations. We quantify these failures using prompt-level entropy, category bias scores, and multilingual consistency indices. Our findings highlight critical safety generalization gaps in multilingual LLMs and show that safety alignment does not transfer evenly across languages. We release \textsc{IndicSafe}, the first benchmark to enable culturally informed safety evaluation for Indic deployments, and advocate for language-aware alignment strategies grounded in regional harms.

翻译：随着大语言模型在多语言场景中的部署，其在文化多样、低资源语言中的安全行为仍然鲜有研究。我们首次系统评估了12种印度语言中LLM的安全表现，这些语言虽有超过12亿使用者，但在LLM训练数据中代表性不足。利用涵盖种姓、宗教、性别、健康与政治等维度的6000条文化特定提示语数据集，我们评估了10个主流LLM在提示语翻译变体上的表现。分析揭示显著的安全偏移：跨语言一致性仅为12.8%，且不同语言间\texttt{SAFE}率方差超过17%。部分模型对低资源文字的良性提示过度拒答，过度标记政治敏感话题，而另一些模型则未能标记不安全生成内容。我们通过提示级熵值、类别偏倚分数和多语言一致性指标量化了这些失败。研究结果凸显了多语言LLM在安全泛化方面的关键缺口，表明安全对齐在不同语言间并不均匀迁移。我们发布首个支持印度语部署场景下文化知情安全评估的基准\textsc{IndicSafe}，并倡导基于区域性危害的语言感知对齐策略。

相关内容

安全评估

关注 11

安全评估分狭义和广义二种。狭义指对一个具有特定功能的工作系统中固有的或潜在的危险及其严重程度所进行的分析与评估，并以既定指数、等级或概率值作出定量的表示，最后根据定量值的大小决定采取预防或防护对策。广义指利用系统工程原理和方法对拟建或已有工程、系统可能存在的危险性及其可能产生的后果进行综合评价和预测，并根据可能导致的事故风险的大小，提出相应的安全对策措施，以达到工程、系统安全的过程。安全评估又称风险评估、危险评估，或称安全评价、风险评价和危险评价。

《ARMOR 2025：一个面向军事领域的基准，用于评估大语言模型安全性》

专知会员服务

20+阅读 · 5月7日

综述：面向移动端大语言模型的隐私与安全

专知会员服务

19+阅读 · 2025年9月7日

158页！天大等最新《大型语言模型安全：全面综述》

专知会员服务

50+阅读 · 2024年12月24日

大型语言模型在国家安全应用中的使用

专知会员服务

57+阅读 · 2024年7月13日