Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen,Adarsh Agrawal,Ahmed M. Ahmed,Victor Akinwande,Namir Al-Nuaimi,Najla Alfaraj,Elie Alhajjar,Lora Aroyo,Trupti Bavalatti,Max Bartolo,Borhane Blili-Hamelin,Kurt Bollacker,Rishi Bomassani,Marisa Ferrara Boston,Siméon Campos,Kal Chakra,Canyu Chen,Cody Coleman,Zacharie Delpierre Coudert,Leon Derczynski,Debojyoti Dutta,Ian Eisenberg,James Ezick,Heather Frase,Brian Fuller,Ram Gandikota,Agasthya Gangavarapu,Ananya Gangavarapu,James Gealy,Rajat Ghosh,James Goel,Usman Gohar,Sujata Goswami,Scott A. Hale,Wiebke Hutiri,Joseph Marvin Imperial,Surgan Jandial,Nick Judd,Felix Juefei-Xu,Foutse Khomh,Bhavya Kailkhura,Hannah Rose Kirk,Kevin Klyman,Chris Knotz,Michael Kuchnik,Shachi H. Kumar,Srijan Kumar,Chris Lengerich,Bo Li,Zeyi Liao,Eileen Peters Long,Victor Lu,Sarah Luger,Yifan Mai,Priyanka Mary Mammen,Kelvin Manyeki,Sean McGregor,Virendra Mehta,Shafee Mohammed,Emanuel Moss,Lama Nachman,Dinesh Jinenhally Naganna,Amin Nikanjam,Besmira Nushi,Luis Oala,Iftach Orr,Alicia Parrish,Cigdem Patlak,William Pietri,Forough Poursabzi-Sangdeh,Eleonora Presani,Fabrizio Puletti,Paul Röttger,Saurav Sahay,Tim Santos,Nino Scherrer,Alice Schoenauer Sebag,Patrick Schramowski,Abolfazl Shahbazi,Vin Sharma,Xudong Shen,Vamsi Sistla,Leonard Tang,Davide Testuggine,Vithursan Thangarasa,Elizabeth Anne Watkins,Rebecca Weiss,Chris Welty,Tyler Wilbers,Adina Williams,Carole-Jean Wu,Poonam Yadav,Xianjun Yang,Yi Zeng,Wenhui Zhang,Fedor Zhdanov,Jiacheng Zhu,Percy Liang,Peter Mattson,Joaquin Vanschoren

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

翻译：本文介绍了由 MLCommons 人工智能安全工作组制定的 AI 安全基准 v0.5 版本。该基准旨在评估采用聊天调优语言模型的 AI 系统的安全风险。我们提出了一种规范化方法来定义和构建该基准，其中 v0.5 版本仅涵盖单一用例（成年人用英语与通用型助手的对话）和有限的角色类型（即典型用户、恶意用户及脆弱用户）。我们建立了包含 13 个危害类别的分类体系，其中 7 个类别在 v0.5 基准中设有测试项。计划于 2024 年底发布 AI 安全基准 1.0 版本。v1.0 基准将为 AI 系统安全性提供有意义的评估依据，但 v0.5 版本不应被用于评估 AI 系统的安全性。我们已尽力完整记录 v0.5 版本的局限性、缺陷与挑战。本次发布的 v0.5 AI 安全基准包含：（1）定义与构建基准的规范化方法，涵盖用例、受测系统类型、语言与上下文、角色、测试项及测试条目；（2）13 个危害类别的分类体系及其定义与子类别；（3）7 个危害类别的测试项，每类包含若干组独特测试条目（即提示语），共 43,090 个通过模板生成的测试条目；（4）针对基准的 AI 系统分级系统；（5）开放平台及可下载工具 ModelBench，用于在基准上评估 AI 系统安全性；（6）示例评估报告，对十余种公开聊天调优语言模型的性能进行基准测试；（7）基准的测试规范。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日