With the rapid development of Large Language Models (LLMs), increasing attention has been paid to their safety concerns. Consequently, evaluating the safety of LLMs has become an essential task for facilitating the broad applications of LLMs. Nevertheless, the absence of comprehensive safety evaluation benchmarks poses a significant impediment to effectively assess and enhance the safety of LLMs. In this work, we present SafetyBench, a comprehensive benchmark for evaluating the safety of LLMs, which comprises 11,435 diverse multiple choice questions spanning across 7 distinct categories of safety concerns. Notably, SafetyBench also incorporates both Chinese and English data, facilitating the evaluation in both languages. Our extensive tests over 25 popular Chinese and English LLMs in both zero-shot and few-shot settings reveal a substantial performance advantage for GPT-4 over its counterparts, and there is still significant room for improving the safety of current LLMs. We believe SafetyBench will enable fast and comprehensive evaluation of LLMs' safety, and foster the development of safer LLMs. Data and evaluation guidelines are available at https://github.com/thu-coai/SafetyBench. Submission entrance and leaderboard are available at https://llmbench.ai/safety.
翻译:随着大语言模型(LLMs)的快速发展,其安全性问题日益受到关注。因此,评估LLMs的安全性已成为推动其广泛应用的关键任务。然而,缺乏全面的安全性评估基准严重阻碍了有效评估和提升LLMs安全性的进程。本研究提出SafetyBench——一个综合性的LLMs安全性评估基准,包含11,435道涵盖7类不同安全问题的多样化多项选择题。值得注意的是,SafetyBench同时收录中英文数据,支持双语评估。我们在零样本和少样本设置下对25个主流中英文LLMs的广泛测试表明,GPT-4相比其他模型具有显著性能优势,当前LLMs的安全性仍有巨大提升空间。我们相信SafetyBench将能够快速全面地评估LLMs的安全性,并推动更安全LLMs的发展。数据与评估指南详见https://github.com/thu-coai/SafetyBench,提交入口与排行榜参见https://llmbench.ai/safety。