Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We utilize XSafety to empirically study the multilingual safety for 4 widely-used LLMs, including both close-API and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries. We release our data at https://github.com/Jarviswang94/Multilingual_safety_benchmark.
翻译:安全性是大语言模型(LLMs)开发与部署的核心。然而,现有的安全基准仅关注单一语言(通常是预训练数据中的主流语言,如英语)的安全性。针对LLMs在全球范围内的实际部署,本研究构建了首个多语言安全基准XSafety。XSafety涵盖10种分属多个语系的语言,包含14类常见安全问题。我们利用XSafety对4个广泛使用的LLM(包括闭源API模型和开源模型)进行了多语言安全性的实证研究。实验结果表明,所有LLM对非英语查询生成的不安全回答均显著多于英语查询,这表明有必要针对非英语语言开展安全对齐研究。此外,我们提出了几种简单有效的提示方法,通过激发安全知识与提升安全对齐的跨语言泛化能力,以改善ChatGPT的多语言安全性。我们的提示方法能将非英语查询的不安全回答比例从19.1%显著降低至9.7%。相关数据已发布于https://github.com/Jarviswang94/Multilingual_safety_benchmark。