Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.
翻译:构建跨多种语言的安全大型语言模型(LLMs)对于确保安全访问和语言多样性至关重要。为此,我们提出了M-ALERT,一个多语言基准测试,用于评估LLMs在五种语言(英语、法语、德语、意大利语和西班牙语)中的安全性。M-ALERT遵循详细的ALERT分类法,每种语言包含15k个高质量提示,总计75k个。我们对10个最先进的LLMs进行了广泛的实验,结果突显了语言特定安全分析的重要性,揭示了模型在不同语言和类别间的安全性常表现出显著的不一致性。例如,Llama3.2在意大利语的crime_tax类别中表现出高度不安全,但在其他语言中则保持安全。类似的差异在所有模型中均可观察到。相比之下,某些类别,如substance_cannabis和crime_propaganda,在所有模型和语言中均持续引发不安全响应。这些发现强调了在LLMs中实施稳健的多语言安全实践的必要性,以确保在不同用户群体中的安全与负责任使用。